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AESTBACT 


Ihe  scope  of  this  thesis  is  twofold.  The  first  is  to 
provide  a  methodology  for  the  performance  measurenierit  of 
database    systems.  Ihe   second    is      the    application      of    this 

methcdclcgy  to ' a  specific  database  system  in  an  attempt  to 
verify  the  applicatility  of  this  methodology  and  the 
performance   and    capacity   claims   of    tne   database   system. 

As  a  aethodology,  the  thesis  describes  the  £trat€gies 
and  locations  for  the  placement  of  checkpoints,  the  kirds  of 
performance  data  to  te  collected,  the  environment  for  the 
conduct  cf  the  performance  measurement  and  the  interpreta- 
tion cf  the  results.  One  of  the  most  important  contribu- 
tions of  this  methodclogy  is  its  capability  to  obtain  actual 
measurement  overhead  iraking  the  presentation  of  truly  accu- 
rate results  possible.  As  an  application  of  this  method- 
ology, vie  attempt  tc  validate  the  performance  and  capacity 
claims  cf  an  experimental  multi-backend  database  syste 
known  as  J1D3S.  Surprisingly,  these  claims  have  been 
validated. 
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I-  i^IBCDOCTION 

A.  A    1BIS1S   OVEHYIEi 

Ihe  scope  of  this  thesis  is  twofold.  The  first  is  to 
provide  a  methodclogy  to  use  in  the  performance  measurement 
of  a  datatase  computer.  The  second  is  the  application  of 
this  methodology  to  a  specific  database  system  and  the 
attempt  to  verify  the  performance  and  capacity  claims  cf  the 
target  system. 

The  dataiDase  system  being  evaluated  is  an  experimental 
multi-t ackerd  database  system  known  as  MDBS,  The  basic 
design  goal  of  MDB£  is  to  develop  an  architecture  which 
spreads  the  work  of  the  database  management  among  multiple 
iackerds.  MDBS  makes  two  basic  claims  in  its  design.  The 
first  is  that  by  increasing  the  number  of  backends  used  as  a 
part  cf  the  database  computer  and  by  keeping  the  size  cf  the 
database  constant,  the  response  time  of  the  same  trans- 
actions is  propcrticrally  decreased.  Tne  second  claim  is 
that  tj  increasing  the  number  of  backends  and  also 
increasing  the  size  of  the  database,  the  response  time 
remains  relatively  ccEstant. 

To  conduct  the  performance  measurement  of  MDBS,  various 
checkpoints  and  data  collections  are  incorporated  into  the 
system.  Although  all  checkpoints  and  data  collections  are 
selected  to  provide  the  greatest  amount  of  useful  informa- 
tion and  to  incur  the  least  amount  of  overhead,  scire  over- 
head is  unavoidable.  A  quantitative  method  for  measuring 
the  overhead  incurred  is  therefore  provided.  The  perform- 
ance results  of  MDBS  are  then  accurately  adjusted  using  the 
overhead  calculation.  In  this  way,  a  truly  accurate  ireas- 
urement  cf  the  system  may  be  obtained. 
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As  a  methodology/  the  thesis  describes  the  strategies 
'and  Iccations  of  the  checkpoint  placenient,  the  kinds  of  data 
en  performance  collected,  the  ways  in  which  the  perforiraLCfe 
measurement  were  conducted  and  the  interpretation  of  the 
results,  .^aybe  of  ^greatest  inportance  is  the  ability  to 
calculate  actual  measurement  overhead  allowing  for  the  pres- 
entation of  truly  accurate  results. 

In  this  thesis,  we  will  focus  our  attention  en  the 
response  tine  of  the  work  being  done  by  the  database  system. 
Ke  will  net  focus  on  the  throughput.  Whereas  the  throughput 
is  defined  as  the  average  number  of  user  requests  executed 
by  the  system  in  a  second,  the  response  time  of  a  request  is 
the  time  between  the  initial  issuance  of  the  request  by  a 
user  and  tie  final  receipt  of  the  entire  response  set  of 
this  reguest  by  the  user  [Bef.  1].  Since  the  majority  of 
the  requests  processed  by  a  database  system  are  requests  ror 
the  retrieval  of  information,  another  limitation  is  made  to 
the  scope  of  this  thesis.  Ke  will  focus  on  the  perforiance 
measurement  of  the  response  time  of  retrieval  requests  in 
MDBS.  Kopefully,  these  evaluations  will  verify  the  claims 
cf  WEBS  and  also  provide  a  general  methodology  for  the 
perf orniaEce  measurement  of  any  database  system. 

E.   lEE  CEGANIZA1I0N  CF  THE  THESIS 

This  thesis  is  organized  into  six  additional  chapters 
beyond  this  overview.  Chapter  II  describes  our  perfcrirance 
measurement  methcdolccy  for  database  systems.  It  initially 
discusses  the  need  fcr  such  a  methodology  and  continues  with 
a  separate  discussion  of  toth  the  internal  and  external 
performance  measurements.  The  chapter  then  culminates  with 
a  discussion  of  the  combination  of  the  two  performance  meas- 
urements, thus  providing  the  methodology  to  calculate  and 
adjust  fcr  internal  performance  measurement  overhead. 
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Chapter  III  freserts  an  overview  of  the  target  system, 
MDBS,  used  to  apfly  the  performance  measuremect  methcdclogy. 
A  general  discussion  is  given  on  the  attrihute-tased  data 
model,  the  directory  tables,  the  process  structure,  the 
message  types,  and  the  execution  of  a  retrieve  request. 

The  application  cf  the  performance  measurement  methcd- 
clogy  tc  the  target  system,  MDBS,  is  presented  in  Chapter 
IV.  The  required  modifications  to  the  MDBS  software  needed 
to  perfcriD  the  measurements  is  discussed,  along  with  a 
discussion  of  the  nodific ations  to  the  test  envircrment 
required  to  control  the  measurement  results.  A  description 
cf  the  additional  software  used  for  both  inter-computer  and 
inter-process  message  processing  measurements  is  also 
provided. 

Chapter  V  presents  the  construction  of  the  test  database 
and  the  selection  of  the  requests  used  in  the  perfcrEarce 
measurements.  In  this  chapter,  the  design  of  the  desired 
test  database  is  first  discussed.  Due  to  system 
constraints,  only  a  subset  of  this  design  is  used  for 
testing  purposes.  Tie  chapter  concludes  with  an  analysis  of 
the  requests  used  in  the  performance  measurement- 
All  the  thesis  work  is  brought  together  in  Chapter  VI 
with  the  presentation  of  the  performance  measurement 
results.  Since  the  goals  cf  this  thesis  are  to  verify  the 
performance  and  capacity  claims  of  MDBS  and  to  provide  a 
methodology  for  the  perfornance  measurement  of  a  database 
systei,  only  the  tests  needed  to  obtain  these  goals  are 
performed.  In  the  chapter,  results  are  provided  for  the 
external  and  internal  performance  measurements,  and  the 
results  of  the  message  processing  measurements. 

The  thesis  ends  with  conclusions  in  Chapter  VII  which 
can  be  made  from  the  results.  It  provides  a  summation  for 
the  entire  thesis  and  offers  suggestions  in  future  work 
which  needs  to  be  done  both   with  the  methodology   and  with 
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the  measureEent  cf  MIES.  It  is  hoped  that  this  thesis  will 
provide  a  scund  lethcdology  for  the  performance  measurements 
cf  datatase  systems  and  also  provide  a  definitive  verifica- 
tion cf  the  perfcrmacce  and  capacity  claims  of  MDBS. 
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II-    ilJIOBHANCE    HlASOREajNT    METHODOLOGY    FOR    MlkM^g 

SYSTEMS 

In  this  chapter,  we  present  a  performance  measureicent 
methodclcgy  for  datatase  systems.  The  methodology  requires 
the  collection  cf  hcth  internal  and  external  perfcrirance 
measurements.  Ihe  internal  performance  measurement  method- 
ology is  the  collection  of  methods  and  tools  which  will 
€iiai:l€  a  tetter  understanding  of  the  target  system  ty  meas- 
uring certain  capabilities  of  that  system.  In  measuring 
certain  capabilities  cf  the  system,  we  focus  on  the  measure- 
ment cf  tine  spent  in  individual  processes  of  the  target 
systen.  The  external  performance  measurement  methodclcgy  is 
the  collection  cf  methods  and  tools  which  will  enable  the 
tetter  understanding  cf  the  target  system  by  measuring  the 
sistea  as  a  whole.  In  measuring  the  system  as  a  whole,  we 
focus  en  the  measurenent  of  the  response  time  of  the  target 
system.  The  response  time  in  a  database  system  is  defined 
in  [B€f.  1]  as  the  tine  between  the  initial  issuance  of  the 
request  ty  a  user  and  the  final  receipt  of  the  entire 
response  set  of  this  request  by  the  user. 

Id  the  rest  cf  this  chapter,  we  begin  by  examining  the 
need  for  a  database  system  and  the  subsequent  need  to 
measure  the  performance  of  the  system.  We  then  discuss  a 
general  performance  aeasurement  methodology,  addressing  both 
internal  and  external  performance  measurement  as  separate 
issues.  Finally,  we  conclude  the  chapter  with  a  discussion 
of  the  ccmtination  cf  internal  and  external  performance 
measurement  results  tc  provide  a  complete  methodology. 
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A.       lEi    NZEE 

Ih€  E€€d  for  a  cataLase  can  best  De  shown  as  ccrre- 
si-ondicg  tc  the  n€€d  for  inf crmation.  A  database  is  a 
repository  for  the  storage  of  information  on  a  computer,  any 
item  cr  combination  of  items  of  which  can  be  easily  accessed 
in    a   relatively    short    timeframe.  A    businessman    may    desire 

all  the  latest  pieces  of  information  to  make'  a  management 
decision.  The  combat  field  commander  may  desire  complete, 
up-to-the-minute   reports   to    arrive   at    a    tactical    decision. 

But  there  are  performance  and  capacity  problems  that 
must  be  overcome  in  providing  this  information.  As  an  ever 
increasing  amount  of  information  is  stored  in  a  database, 
the  response  time  of  the  database  system  increases  notice- 
ably. In  addition  tc  the  increase  in  the  size  of  the  data- 
base, there  is  the  effect  cf  increasing  the  number  cf  users 
accessing  the  system  and  the  number  of  requests  to  be 
processed  by  the  system.  Thus  the  user  must  select  between 
the  response  time  desired  and  tne  information  desired,  a 
choice  the  user  does  not  want  to  and  should  not  have  to 
make.  Ihe  database  system  needs  to  be  easily  upgraded  to 
accommocate  new  users  and  to  increase  the  database  size 
without  noticeable  change  in  response  time.  This  is  the 
need    for    the  response-time    inyariance    in    a    database    system. 

Another  problem  -is  in  the  timeliness  of  a  response.  The 
database  system  should  offer  a  dependable,  constant  return 
rate  for  the  response  to  a  request.  When  response  time 
becomes  unreasonably  long  due  to  the  computer  workload,  the 
user  will  be  frustrated.  A  user  desires  to  have  every 
request    returned   in    a   timely    manner.  This   is    the    need   for 

res£cnse-titte  consistency  in   a    database   system. 

A  final  problem  is  to  insure  that  all  necessary  infcrnr.a- 
tion  is  available  to  the  user.  Incomplete  information  is  of 
little    use.      For   exaifle,    a    user    may    require    all    requests    to 
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have  a  respcnse  withir  a  specified  timeframe.  This  reguire- 
ttent  cften  dictates  the  maximuni  size  of  the  database  and  the 
maximun  xuniter  cf  requests.  Therefore,  ai  undesireatle 
limitaticn  is  placed  ce  the  amcunt  of  information  availatle 
due  tc  the  limitaticn  on  database  size.  Again,  the  user  is 
forced  into  making  a  tradeoff  between  the  response  tisce  and 
the  availatle  infornation.  Nevertheless,  despite  the 
response  tiie,  such  information  should  be  made  available  to 
the  user  on  demand.  ,  This  is  the  need  for  availability  of 
inforiation  in  the  database  system. 

Therefore,  not  only  is  there  a  need  for  a  database 
system,  there  is  alsc  a  need  for  a  database  system  with  the 
qualities  cf  Invariance,  Consistency,  and  Availability 
(ICA)  .  Eut  ICA  can  be  present  in  varying  degrees  in  a  data- 
base system.  The  degree  of  ICA  can  best  be  demonstrated  by 
the  performance  measurement  of  the  database  system. 

There  are  two  basic  types  of  database  systems.  The 
first  is  an  online  software  database  management  system  that 
runs  on  the  host  computer  system.  The  second  is  a  database 
irachinG,  which  offloads  the  database  functions  tc  a  dedi- 
cated backend  computer.  The  current  trends  in  database 
systeirs  involve  the  design,  inplementation,  and  use  cf  data- 
base machines  [Eef.  1  through  8].  Not  only  is  there  an 
apparent  improvement  in  ICA  with  a  corresponding  price  per 
perfcimance'  advantage,  but  a  datauase  machine  can  free  up 
resources  at  the  host,  provide  support  for  multiple,  dissim- 
ilar hosts,  and  increase  the  security  on  the  database  by  the 
physical  separation  of  the  database  and  the  host.  Eue 
primarily  tc  the  trerd  toward  increasing  future  use  cf  data- 
base uachines,  this  thesis  will  concentrate  on  the  discus- 
sion and  application  of  the  methodology  for  measuring  the 
database  machines. 

A  database  machine  is  a  database  system  composed  cf  one 
or  more   processors,   dedicated   to  performing   the  database 
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iDanagemert  functions.  It  is  indisputable  that  a  database 
nachine  is  the  better  of  the  two  ty^^es  of  database  systems 
with  regards  to  providing  an  increase  in  security,  allowing 
for  nultifle  host  support,  and  freeing  up  the  host 
resources.  But  there  still  exists  the  need  to  demonstrate 
an  improvement  in  the  ICA  on  a  database  machine  over  the  ICA 
provided  by  a  host-resident  database  system.  At  the  same 
time,  there  exists  a  need  to  compare  the  invariance,  consis- 
tency and  availability  of  several  different  database 
nachines  and  software  systems.  Again,  this  can  best  be 
demonstrated  by  measureing  these  systems. 

Ee Sf cnse-time  consistency  is  more  easily  achieved  in  a 
database  machine  thai  in  a  database  system  running  en  the 
host.  Whereas  the  host  must  share  its  resources  with  a 
varying  workload,  the  backend  can  dedicate  its  resources  for 
database  management.  Availability  frees  the  Database 
Administratcr  from  the  necessity  to  make  tradeoffs  between 
the  si2e  of  the  datahase  and  the  response  time.  The  adminis- 
tratcr can  then  load  the  database  with  all  the  necessary 
information  regardless  of  the  database  size.  To  achieve  and 
verify  the  response  time  invariance.  of  a  database  machine, 
a  methodology  to  measure  its  effectiveness  nust  be 
develcpe  d. 

Thus,  the  scope  of  this  thesis  is  to  provide  a  perform- 
ance measurement  methodology  for  database  machines  and  to 
verify  this  methodology  by  verifying  the  design  claims  of  a 
specific  database  machine,  known  as  MDBS.  Again,  these 
claims  are  related  tc  the  quality  of  response  time  invari- 
ance; that  is,  tc  be  able  to  change  the  size  of  the  database 
and  at  the  same  time  maintain  constant  response  time  cr  to 
hold  constant  the  size  of  the  database  with  the  ability  to 
reduce  the  response  time.  Consequently,  the  measuremert  of 
the  response  time  of  a  database  system  becomes  the  focal 
point  of  our  studies.   If  the   response  time  can  be  frcperly 
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and  accuxately  measured,  the  claims  of  the  target  systeir  can 
he  verified.  Furthermore,  the  effectiveness  of  the  methcd- 
clogy  car.  also  te  verified.  A  pro^^er  measurement  ct  the 
response  tine  can  provide  a  taseline  measurement  to  wnich 
ether  database  systems  can  be  compared  and  thus  provide  a 
price-performance  ccnpariscn  of  various  systems.  This 
thesis  provides  an  overhead-free  performance-measurement 
methodology  and  applies  this  methodology  to  verify  the 
claims  of  an  experimertal  database  machine. 

B.   lEE  iPPEOACH 

In  this  section,  we  discuss  a  general  methodology  to  be 
used  in  the  performance  measurement  of  a  database  system. 
Ihis  cethodclogy  is  general  and  can  be  applied  to  any  other 
database  system.  We  first  discuss  the  internal  performance 
measurement.  This  ircludes  the  design  considerations,  the 
software  engineering  criteria  and  the  application  of  the 
methodology  to  a  particular  system.  Then  we  present  a 
discussion  of  the  external  performance  measurement,  again 
discussing  the  design  considerations,  the  software  engi- 
neering criteria,  and  the  "application  of  the  methodology  to 
a  particular  system. 

^  *      A  Methodology  for  I  eternal  Performance  Measurement 

The  goal  of  the  internal  performance  measurement 
methodology  is  to  provide  methods  and  tools  which  will 
enable  us  to  better  understand  the  target  system  by  meas- 
uring certain  aspects  of  that  system.  A  complete  under- 
standing of  how  the  system  performs  internally  may  lead  to 
design  modifications  or  to  fine-tuning  of  the  system  for 
better  performance.  The  internal  performance  measurement 
tools  should  be  unobtrusive  to  the  user,  available  when 
necessary,  yet  out  of  the  way  when  not  required.  They  should 


20 


he  integrated  with  the  target  system  to  produce  a  sirooth 
transition  tetween  target  system  operation  and  the  operation 
of  th€  tool.  In  the  first  part  of  this  section,  we  address 
the  design  considerations  cf  internal  performance  iiieasure- 
aent  lethods.  Next,  we  discuss  certain  software  engineering 
criteria  which  are  applicable  to  the  design  of  good  coeasure- 
nient  tools.  Finally,  we  explore  the  application  cf  the 
internal  performance  neasurement  methodology  to  a  particular 
systei. 

a.   Design  Consider aticns 

Internal  performance  measurement  relies  on 
checkpoints  internal  to  the  database  system  software.  A 
chec]<fcint  is  defined  as  a  procedural  invocation  inserted 
into  the  system's  flew  of  control  to  call  the  performance 
measurement  routines  which  are  used  for  the  data  collection. 
Systen  overhead  is  introduced  as  each  checkpoint  is  added  to 
the  target  system,  Additionally,  measurement  software  is 
required  to  process  the  checkpoint  data  in  a  manner  compat- 
ible with  the  existing  target  system  software.  That  is,  a 
certain  portion  of  the  measurement  •  software  must  he  inte- 
grated with  the  target  system  software  to  handle  events  such 
as  data  storage,  message  passing,  and  information  processing 
that  relate  to  the  checkpoint  data.  Finally,  the  existing 
target  system  software  may  require  additional  lines  of  code 
to  handle  new  cases  introduced  by  the  measurement  system. 

In  most  external  performance  measurement,  over- 
head is  negligible.  However,  internal  measurement  routines 
add  significant  overhead  to  the  database  system  which  cannot 
be  disregarded.  For  internal  measurement,  we  must  discover 
ways  to  reduce  the  overhead  generated  by  the  measurement 
software.  We  must  also  be  able  to  measure  the  overhead 
which  cannot  be  eliminated,  so  that  the  measurements  can  be 
adjusted  accordingly,   A  very   important  requirement  is  that 
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the  existing  target  system  must  maintain  the  capability  of 
runniEg  unimpeded  by  tie  additional  measurement  software. 

Consideration  must  be  jiven  to  the  level  in  the 
target  system  where  checkpoints  may  be  placed.  Some  possible 
levels  are  at  the  very  high  level,  i.e.,  the  system  level, 
the  high  level,  i.e.  ,  the  program  level,  the  medium  level, 
i.e.,  the  subroutine  level,  and  the  low  level,  i.e.,  the 
subroutine  segment  level.  Whereas  external  performance 
measurement  only  places  checkpoints  at  tne  very  high  level, 
internal  performance  measurement  places  checkpoints  below 
that  level.  Checkpoints  must  be  placed  at  a  level  which 
produces  data  in  sufficient  detail  to  provide  the  user  with 
a  basic  understanding  of  the  system's  performance  character- 
istics. Checkpoints  should  not  be  placed  at  a  level  sc  low 
as  to  overwhelm  the  user  with  detailed  data  or  to  interfere 
significantly  with  system  performance. 

For  internal  performance  measurement,  the  user 
should  have  the  capability  to  access  selected  data  cut  of  a 
range  cf  possible  choices.  The  user  should  not  be  required 
to  receive  information  about  processes  which  are  net  of 
current  interest.  The  interface  should  be  easy  to  use  and 
should  net  distract  tie  user  frcm  his  primary  goal  of  under- 
standing the  database  system  by  requiring  the  user  to 
remember  the  unique  syntax  or  semantics  of  the  test  inter- 
face, lie  collected  neasuremen ts  should  be  made  accessible 
to  autcmated  processing  routines  for  data  reduction. 

b.   Software  Engineering  Criteria 

Measurement  software  should  be  designed  using 
modern  software  engineering  methods.  The  resulting  software 
should  be  understandable,  maintainable,  reliable  and  compat- 
ible with  the  target  system.  Certain  software  engineering 
methods  are  of  pcirticular  interest.  These  methods  are  irodu- 
larizaticn^  user-friendliness,  data  abstraction,  and 
simplicity. 
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For  modularity,  the  measurement  programs  should 
he  hierarchically  structured  with  well-defined  interfaces. 
Ihe  measurement  modules  should  te  reusable  throughout  the 
target  sjstem-  Modularity  allows  the  system  to  be  easily 
extended  to  checkpoints  not  considered  in  the  initial  speci- 
fications. The  test  interface  should  present  an  easy-to-use 
method  for  obtaining  test  data.  It  should  automatically 
aggregat€  data  while  still  allowing  the  user  to  access  raw 
data.  The  user  should  not  have  to  remember  the  specific 
syntax  and  semantics  cf  the  test  interface.  Data  abstrac- 
tion should  be  used  sc  that  subsequent  program  modifications 
do   net      result    in   extensive    reprogramming.  An    appropriate 

choice  cf  primitives  (  data  structure  and  operations  )  will 
allow  for  easy  change  and  produce  less  system  overhead.  Ihe 
measurement  system  shculd  be  user-friendly.  In  addition  to 
obeying  the  simplicity  principle,  the  test  interface  should 
te  forgiving,  i.e.,  system  should  not  crash  on  bad  input, 
provide  readable  error  diagnostics,  anticipate  errors,  and 
guard    against    these    errors. 

c.      Issues    in   the   Application    to   Database    Systens 

Application  of  the  internal  performance  measure- 
ttent  irethodclogy  to  a  particular  database  system  requires 
that  the  evaluator  understand  certain  aspects  of  the  target 
system.  The  evaluator  must  understand  the  programming 
language  used  to  construct  the  database  system,  and  the 
structure  and  operation  of  the  database  system.  The  evalu- 
ator irust  te  prepared  to  overcome  obstacles  presented  by  the 
target  system  in  the  course  of  the  implementation  cf  the 
performance   measurement. 

A  thorough  understanding  of  the  programming 
language  is  necessary  to  successfully  integrate  checkpoints 
and  data  collection  programs  into  the  existing  software 
structure.      One    must      be  familiar   witn   the      data    structures. 
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contrcl  structures.  Darning  con'ventioiis,  and  parameter- 
passing  mechanisms  of  the  language,  in  order  to  inpiement 
the  ceasurement  programs  efficiently  and  to  minimize  their 
overhead.  Knowledge  cf  the  laoguage  syntax  reduces  program- 
ming errors  and  speeds  implementation  of  the  measurement 
tools . 

For  effective  internal  performance  measureirert, 
checkpcirts  must  be  correctly  placed  in  the  database  system. 
Incorrectly  placed  checkpoints  increase  overhead  and  degrade 
perfcrirarce  measurement  by  providing  useless  data  to  the 
user.  The  evaluator  aust  possess  sufficient  knowledge  of  the 
target  system  to  allcw  for  the  correct  placement  of  check- 
points. Ihis  provides  the  siocth  integration  of  data  collec- 
tion prcgraiis,  data  processing  programs  and  data  trarsler 
prograis   into   the   existing    database   system. 

Chances  are  that  the  target  system,  when 
initially  designed,  was  not  designed  with  internal  perform- 
ance neasurement  in  mind.  Instead,  the  target  systeir  was 
designed  to  process  all  requests  efficiently.  Integration 
of  the  icternal  performance  measurement  routines  may  affect 
the  target  systen  ir  unexpected  ways.  Let  us  consider  two 
examples  cf  such  ways.  First,  in  a  message-passing  system, 
messages  generated  hy  the  aeasurement  programs  may  require 
codifications  to  the  existing  database  system  so  that  test 
messages  will  not  be  confused  with  the  messages  of  the  data- 
base system.  Second,  the  volume  of  information  generated  by 
the  measurement  programs  may  overload  selected  sections  of 
the  target  system.  Ihe  evaluator  of  the  performance  meas- 
urement routines  must  be  prepared  for  such  contingencies. 
Ey  using  the  knowledge  of  the  programming  language  alcng 
with  the  knowledge  cf  the  database  system,  the  evaluator 
must  be  prepcired  to  cffer  solutions  to  the  database  adminis- 
trator en  hew  to  gracefully  integrate  the  performance  meas- 
urement mechanisms  into  the  target  system  with  proper 
modification  and    without    overload. 
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2  .   A  aethodology  for  E xternai  Pe rf ormance  M easuremen t 

Ihe  goal  of  external  ferformance  measurement  is  to 
provide  a  ccllection  cf  methods  and  tools  which  will  enatle 
us  to  tetter  understand  the  target  system  by  measuring  the 
system  as  a  whole.  In  this  way  we  can  measure  the  total 
work  teirg  done  ty  the  datahase  system.  We  focus  en  meas- 
uring the  response  time  of  the  system,  the  elapsed  time 
tetween  the  issuance  of  a  reguest  and  the  receipt  cf  the 
response  tc  the  reguest. 

Internal  performance  measurement  has  been  shewn  to 
te  beneficial  in  the  fine-tuning  of  a  system,  and  in  the 
Eicrcsccpic  examination  of  the  work  being  performed  ty  the 
system.  External  measurement  provides  a  guantitative  meas- 
urement cf  the  system  from  a  macroscopic  view.  This  allows 
for  the  ccttfarison  of  database  systems.  In  the  first  part 
cf  the  section,  we  discuss  the  design  considerations  of  the 
external  performance  neasurement  methods.  Next,  we  present 
the  software  engineering  criteria  for  external  performance 
measurement.  lastly,  we  show  the  application  of  the 
external  performance  neasurement  to  a  system. 

a.   Design  Considerations 

External  performance  measurement  should  have 
negligible  overhead,  i.e.,  the  response  time  with  external 
performance  measurement  should  be  the  same  as  the  response 
time  without  measurement  being  performed.  This  is  in  fact 
the  case.  The  reason  that  the  overhead  is  negligible  is 
that  only  two  timing  checkpoints  need  to  be  made.  These 
timing  checkpoints  are  placed  at  the  beginning  of  a  reguest 
and  the  end  of  the  response  tc  the  reguest,  thus  providing 
the  elapsed  time  of  the  response  for  a  reguest.  The  timing 
checkpoints  need  the  system  time  at  the  start  and  conpletion 
cf  the  reguest.   The  checkpoints  are  placed  at  the  very  high 
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level  to  insure  a  conflete  measurement  of  the  total  elapsed 
time. 

There  are  other  issues  that  must  be  considered 
to  insure  that  the  system  heing  evaluated  is  as  'fure'  as 
possible.  First,  the  system  should  retain  only  these  cede 
and  messages  rei^uiied  for  the  running  of  tne  system. 
Messaces  and  code  incorporated  into  the  system  for  the 
design  er  debugging  of  the  system  should  be  removed. 
Second,  the  system  should  not  contain  unnecessary  software 
tools  designed  to  aid  the  measurement,  such  as  those  used  to 
create  a  test  database.  Such  tools  should  remain  in  soft- 
ware exterior  to  the  actual  database  system. 

An  obvious  consideration  is  to  insure  that  no 
human  interaction  is  involved  in  the  timings.  The  system 
software,  not  the  reaction  time  of  the  user,  is  being  timed. 
Therefore,  the  timer  should  start  immediately  after  a  user 
releases  the  reguest.  The  timer  should  stop  immediately 
prior  to  the  display  on  the  selected  output  device.  The 
reason  for  stopping  tie  timer  prior  to  display  is  due  te  the 
varying  delays  caused  by  the  output  devices.  The  speed  of 
an  output  device  should  not  be  included  into  the  system 
timing  results. 

The  final  issue  involving  the  placement  of 
external  performance  measurement  checkpoints  is  whether  to 
embed  the  timer  code  in  the  system  or  to  call  a  timer 
routine  outside  the  system,  A  call  to  a  timer  routine 
incurs  unwanted  timing  delays,  adding  to  the  impurity  of  a 
system.  If  the  timer  code  is  embedded,  it  can  te  made  to 
appear  that  the  systea  code  being  tested  is  embedded  in  the 
timer  cede,  i,e.,  placing  the  timer  initialization  code  just 
prior  to  the  point  of  the  reguest  by  the  user  and  the  timer 
finalization  code  just  subsequent  to  the  display  en  the 
output  device.  Uith  these  considerations,  an  optimal  place- 
ment of  checkpoints  can  be  selected  to  take  external 
perfcrirarce  timings. 
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t.      Software    Ingineericg    Criteria 

Unlike  irternal  performance-measurement  software 
which  uses  software  design  methodologies,  the  external 
perf crniance-measuremert  software  uses  software  design  tcols. 
In  £E€f.  9],  a  full  description  is  provided  of  the  necessary 
external  performance-iieasurement  tools.  These  tools  include 
a  test-file  generaticr.  package,  a  dataiDase  load  sutsystem, 
and   a    reguest   generation  package. 

The  purpose  of  the  test-file  generation  package 
is  tc  create  a  test  database.  This  allows  for  the  easy 
creation  of  a  database  containing  the  desired  parameters  to 
be   evaluated.  The    database      load   subsystem      must    prcperly 

load    the      files    created      in    the      generation   package.  This 

includes  the  creation  of  directories  for  the  test  database. 
The  reguest  generaticn  package  is  used  to  create  and  execute 
test  reguests,  and  provides  fcr  easy  variance  in  the  types 
and  ccuplexity  of  reguests.  This  package  also  archives  the 
reguests  fcr  later  use.  Using  these  tools,  the  external 
perfcrirance  timings  of  the  database  system  under  measurement 
can    be    easily   obtained. 

c.      Issues    in   the    Application    of    the    Methodology 

The  ease  with  which  external  performance  meas- 
urement can  be  performed  on  a  database  system  can  vary. 
There  are  two  inportant  considerations:  the  language  in 
which  the  system  is  written  and  the  degree  of  software  engi- 
neering   used  in    the    database    system    design. 

The  language  needs  to  be  readable  and  to  cciipli- 
nent  proper  documentation  of  the  system.  This  will  facili- 
tate an  understanding  cf  the  system  b-j  the  system  evaluatcr. 
The  language  must  alsc  be  powerful  enough  to  easily  incorpo- 
rate sjsteic  commands,  such  as  reguests  for  the  system  time. 
A      language,      such      as   C,         has      these   capabilities,         teing 
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primarily  designed  fci  system  programming.  C  is  a  high- 
level  language,  that  is  toth  powerful  and  portatle. 
Although  the  support  software  tools  such  as  database  lead 
can  te  inplemented  in  a  language  other  than  the  language  in 
which  th€  database  system  was  written,  the  evaiuatcr  Leeds 
to  te  fdiiliar  with  several  different  languages  if  several 
different  database  systems  are  to  be  evaluated. 

The  degree  of  software  engineering  used  in  the 
database  system  design  will  most  definitely  facilitate  any 
external  performance  measurement  to  be  done.  If  the  data- 
base system  was  hierarchically  designed  using  modularity, 
knowledge  of  the  internal  workings  of  the  system  by  the 
evaluator  will  be  minimal.  Only  the  upper  level  in  the 
hierarchy  need  to  be  studied  for  the  proper  placement  of  the 
checkpoints.  External  measurement  only  requires  a  macro 
knowledge  of  the  system.  Ihis  is  to  insure  that  the  check- 
points are  indeed  properly  placed  at  the  very  high  level. 

C.   1EE   CCHBINATION  CF   INIEENai   AND  EXTERNAL   PEEFCEMANCE 
MIASDBEHENTS 

Separately,  internal  and  external  performance  measure- 
ments provide  a  wealth  of  information  to  the  evaluator. 
Internal  performance  measurement  provides  the  timings  and 
data  collections  of  individual  processes  in  the  database 
system.  External  performance  measurement  provides  the 
elapsed  time  for  the  complete  request.  Yet,  when  the  two 
methcdclcgies  are  combined,  there  is  a  synergistic  efxect  to 
the  amount  cf  information  available  to  the  evaluator. 

The  combination  cf  internal  and  external  performance 
measurements  is  natural.  There  are  benefits  to  te  gained 
for  one  frcir  the  other.  For  example,  we  can  determine  the 
overhead  incurred  wten  using  internal  performance  measure- 
ment;  first,  using  the  external  checkpoint,   we  collect  the 
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elapsed  time  for  processing  a  particular  request.  This  time 
is  then  compared  to  the  elapsed  time  of  the  request  wten 
toth  irternal  and  external  checkpoints  are  enabled.  The 
difference  in  the  elapsed  times  of  these  two  measurements 
provides  an  exact  measurement  of  the  overhead  incurred  by 
the  irternal  performance  measurement  software  for  this 
request. 

Ce  the  ether  hand,  we  can  use  the  internal  perfcrmance 
measurement  timings  to  interpret  the  external  performance 
measurement  timings.  In  particular,  if  a  request  takes  many 
hundredths  cf  a  seccnd  as  a  result  of  external  performance 
measurement,  the  evaluator  would  want  to  determine  the 
precise  distribution  of  the  work.  Internal  performance 
measurement  can  answer  these  questions.  By  combining  the 
two  measurements,  the  whole  of  the  measurement  results  is 
more  meaningful  and  useful  than  the  individual  results. 

In  the  following  chapter,  the  target  system,  i.e.,  MDBS 
is  described.  Ihis  is  the  system  selected  to  be  evaluated 
using  the  internal  and  external  performance  measurement 
methcdolcgies  presented  in  this  chapter. 
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III.  THE  HUili-BACKEND  EATABASI  SYSTEM  (MDBS) 

Ic  this  chapter  ve  discuss  the  configuration  and  theory 
of  o]:€raticn  ox  the  multi-tackend  database  system  (MEBSJ. 
This  chapter  has  been  extracted  frOiS  papers  and  reports 
which  have  teen  written  on  MDBS  [ Ref .  6,  10,  11,  12]- 

MIES  uses  twc  or  more  identical  minicomputers  and  their 
disk  systems  to  provide  a  centralized  database  system  with 
support  for  multiple,  dissimilar  hosts.  One  minicomputer 
functions  as  the  controller.  User  access  is  acconplished 
through  a  host  computer  which  in  turn  communicates  with  the 
ccntrcller.  Multiple  minicomputers  and  their  disks  are 
coniigur€d  in  parallel  to  serve  as  backends.  The  original 
design  and  analysis  cf  MDBS  is  due  to  J.  Menon  [Ref.  1,  2]. 
Ihe  inplementation  and  new  design  efforts  are  documented  in 
[Ref.  3  through  6].  Ihe  database  is  distributed  across  all 
cf  the  backends.  The  database  management  functions  are 
replicated  in  each  backend. 

As  shown  in  Figure  4.1,  the  controller  and  the  backends 
are  connected  by  a  broadcast  bus.  When  a  transaction  is 
received  frcm  the  host  computer,  the  controller  broadcasts 
tne  transaction  to  all  the  backends.  Each  backend  has  a 
number  of  dedicated  disk  drives.  Since  the  data  is  distrib- 
uted across  the  backends,  a  transaction  can  be  executed  by 
all  backends  concurrently.  Each  backend  maintains  a  gueue  of 
transactions  and  schedules  reguests  for  execution  inde- 
pendent cf  the  other  backends,  in  order  to  maximize  its 
access  operations  and  to  minimize  its  idle  time.  Ihe 
controller  does  very  little  work.  It  is  responsible  for 
broadcasting,  routing,  and  assisting  in  the  inserticr  cf  rew 
data.  Ihe  backends  do  most  of  the  database  operations. 
Fresentlj,   MDBS  is   fully  operational  with  a   VAX  11/780  as 
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Figure  4.1    The  MDBS  Structure. 
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the  ccEtrclIer  and  tvo  PDF  11/44s  and  their  disks  as  the 
iackcEds. 

MIBS  is  a  message-oriented  system.  In  a  message— cri ented 
syst€ii),  each  process  corresponds  to  one  system  function. 
Ihese  processes,  then,  communicate  among  themselves  by 
passirg  messages.  User  requests  are  passed  between  processes 
as  messages.  The  message  paths  between  processes  are  fixed 
for  the  system.  The  MDBS  processes  are  created  at  S7ste!n 
start  tine  and  exist  until  the  system  is  stopped. 

MIES  is  designed  to  perform  the  primary  database  opera- 
tions, INSZ5T,  DfLETI,  UPDATE,  and  RETRIEVE.  Of  these  four 
database  operations,  cnly  the  retrieval  operation  will  be  of 
concern  to  us  in  this  thesis.  The  syntax  and  semantics  of 
the  retrieve  operation  is  discussed  in  Chapter  V.  Users 
access  MEES  through  the  host  by  issuing  either  a  request  or 
a  transaction.  A  transaction  is  a  set  of  requests.  A 
request  is  a  primary  operation  along  with  a  qualification.  A 
qualification  is  used  to  specify  the  information  of  the 
database  ttat  is  to  be  accessed  by  the  request.  Mere 
complete  definitions  of  the  MDBS  terminology  can  be  found  in 
the  following  section. 

In  the  remainder  of  this  chapter  we  first  discuss  the 
directory  structure.  Next,  we  provide  an  overview  of  the 
process  structure.  Then,  a  presentation  of  the  message  types 
is  provided.  Lastly,  we  trace  the  execution  sequence  of  a 
retrieve  request. 

A.   TEE  ATTEIBDTE-BASID  DATA  MCDEL 

In  this  section  we  discuss  the  attriuute— based  data 
model.  Next  we  provide  some  definitions  in  order  to  discuss 
MDBS  directory  data.  We  conclude  this  section  by  describing 
the  tables  necessary  to  maintain  the  MDBS  directory 
inf or  nation . 


32 


lE  the  attritute-hased  data  model,  data  is  modeled  with 
the  ccDstructs:  database,  file,  record,  attribute-value 
pair,  directory  keywcrd,  directory,  record  body,  keyword 
predicate,  and  query.  Informally,  a  dai.^i^§£  consists  of  a 
collecticn  cf  files.  Each  file  contains  groups  of  records 
which  are  characterized  by  a  uni-jue  set  of  directory 
keywords.  A  record  is  composed  of  two  parts.  The  first 
part  is  a  collection  of  attri but e^ value  pairs  or  keywords. 
An  attribute— value  pair  is  a  member  of  the  Cartesian  product 
cf  the  attribute  rame  atd  tne  value  domain  of  the 
attribute.  As  an  example,  <POPULATION,  25000>  is  an 
attribute— value  pair  having  25000  as  the  value  for  the  popu- 
laticL  attribute.  A  record  contains  at  most  one  attribute- 
value  pair  for  each  attribute  defined  in  the  database. 
Certain  attribute— value  pairs  of  a  record  (or  a  file)  are 
called  tie  director  y  keywords  of  the  record  (file)  ,  because 
either  the  attribute-value  pairs  or  their  attribute- value 
ranges  are  kept  in  the  directory  for  addressing  the  record 
(file).  ^  Those  at  tribute— value  pairs  which  are  not  kept  in 
the  directory  for  addressing  the  record  (file)  are  called 
non— directory  keywords.  The  rest  of  the  record  is  textual 
inf or Ea ticn,  which  is  referred  to  as  the  record  body.  An 
example  cf  a  record  is  showc  below. 

(  <II1E,  Censu£>,  <CITY,  McDterey>,  <POPULATION,  25000, 

{  Temperate  climate  }  ) 

The  angle  brackets,  </>,  enclose  an  attribute-value  pair, 
i.e.,  ke;yword.  The  curly  brackets,  {/},  include  the  record 
body.  The  first  attribute— value  pair  of  all  records  of  a 
file  is  the  same.  In  particular,  the  attribute  is  IIIZ  and 
the  value  is  the  file  name.  A  record  is  enclosed  in  the 
parenthesis.  For  example,  the  above  sample  record  is  from 
the  Census  file. 
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The  database  is  accessed  ty  indexing  on  directory 
keywords  using  keyword  predicates.  A  keyword  predicate  is  a 
three-tufle  consisting  of  an  attribute,  a  relational  oper- 
ator (=,  #,  >,  <t  >/  <)  ,  and  an  attribute  value,  i.e., 
FOPUIilTICN  >  20000  is  a  keyword  predicate.  More  specifi- 
cally, it  is  a  grea ter-than- or-equal-to  predicate. 
Combining  keyword  predicates  in  disjunctive  normal  fcrm 
characterizes  a  guer  y  of  the  database.   The  query 

(  FIIE  =  Census  and  CITY  =  Monterey  )  or 
(  FILE  =  Cersus  and  CITY  =  San  Jose  ) 

will  be  satisfied  by  all  records  of  the  Census  file  with  the 
CITY  cf  either  Monterey  or  San  Jose.  For  clarity,  we  also 
employ  parentheses  for  bracketing  predicates  in  a  guery. 

Becall  that  in  [!IBS  there  are  four  types  of  requests 
which  correspond  to  the  four  primary  database  operations.  An 
example  cf  a  retrieve  reguest  would  be: 

RETEIFVE  (  FILE  =  Census  and  POPaLAIION  >  10000  )   (CIIY) 

which  retrieves  the  tames  of  all  those  cities  in  the  Census 
file  whose  population  is  greater  than  10000.  Notice  that 
the  qualification  ccuponent  cf  a  retrieve  request  consists 
of  two  parts,  the  guery  of  two  predicates  (  FILE  =  Census 
and  PCPUIATION  >  100CC  )  and  the  target  list  (CITY).  The 
query  specifies  which  records  of  the  database  are  tc  be 
retrieved.  The  target  list  specifies  the  attribute— value (s) 
to  be  returned  to  the  user.  A  user  may  wish  to  treat  two  or 
more  requests  as  a  transaction.  In  this  situation,  MCBS 
executes  the  requests  of  a  transaction  without  permuting 
them,  i.e.,  if  T  is  a  transaction  containing  the  requests 
<R1><E2>,  then  MDBS  executes  the  request  R1  before  reguest 
E2.  Firally,  we  define  the  term  traffic-unit  to  represent 
either  a  single  reguest  or  a  transaction  in  execution. 
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B-   1EZ  DIEZCTOEY  TAEIIS 

Ic  macage  the  database  (often  refered  to  as  user  data) , 
MDBS  uses  directory  data.  Directory  data  in  MDBS  corresponds 
to  attritutes,  descriptors,  and  clusters.  An  attribute  is 
used  to  represent  a  category  of  the  user  data;  e.g., 
FOPUIAIICN  is  an  attribute  that  corresponds  to  actual  popu- 
lations stored  in  the  database.  A  descriptor  is  used  to 
describe  a  range  of  values  that  an  attribute  can  have;   e.g, 

(100C1  <  POPULATION  <  15000)  is  a  possible  descriptor  for 
the  attribute  POPOLAIION.  Ihe  descriptors  that  are  defined 
for  an  attribute,  €,g.,  population  ranges,  are  mutually 
exclusive.  Now  the  notion  of  a  cluster  can  be  defined.  A 
cluster  is  a  group  of  records  such  that  every  record  in  the 
cluster  satisfies  the  same  set  of  descriptors.  For  example, 
all  records  with  POPDIATION  between  10001  and  15000  cay  form 
one  cluster  whose  descriptor  is  the  one  given  above.  In  this 
case,  tie  cluster  satisfies  the  set  of  a  single  descriptor. 
In  reality,  a  cluster  tends  to  satisfy  a  set  of  multiple 
descriptors. 

Eirectcry  information  is  stored  in  three  tables:  the 
Attribute  lable  (AT) ,   the  Descriptor-to— Descriptor-Id  Table 

(DDII)  and  the  Cluster-Definition  Table  (CDl) .  The  Attribute 
Table  maps   directory  attributes  to  the   descriptors  defined 


Attribute 
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Figure  4.2   An  Attribute  Table  (AT) 
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en       them.       A      sample    AT      is      depicted    in      Figure    4.2.  Ihe 

£SScri£tcr—t03Dejcri_£ tor-Id  "l^tle  maps  each  descriptor  to  a 
unigu€  descriptor  id.  A  sample  DDIT  is  given  in  Figure  4.3. 
Note  that  the  pointer  shown  in  Figure  4. 3  is  not  actually  in 
the    ECU   tatle    but    is   shown      here    for    clarity    to    relate   back 


Descriptor 


E-> 


C-> 


F-> 


Id 


C  <  POPULATION  < 

50000 

D1  1  1 

5000  1  <  POPULATION  < 

100000 

D12 

100001  <  POPULATION  < 

250000 

D13 

250001  <  POPULATION  < 

500000 

D14 

CIT"X  =  Cumberland 

D21 

CITY   =  Columbus 

D22 

FILE  =  Employee 

D31 

FILF  =  Census 

D32 

1 

Dij:    Descriptor    j    for   attribute   i. 


Figure    4,3        A   Descxiptor-to-Descriptor-Id   Table    (EDIT). 

to  the  AT  table  of  Figure  4.2.  The  Cluster— Definition  lable 
maps  descriptor— id  sets  to  cluster  ids.  Each  entry  consists 
cf  the  unigue  cluster  id,  the  set  of  descriptor  ids  whose 
descriptors  define  the  cluster,  and  the  addresses  cl  the 
records  in  the  clusters.  A  sample  CDT  is  shown  in  Figure 
4.4,  Thus,  to  access  the  directory  data,  we  must  access  the 
AT,     EEIT,    and   CDT. 

Gee  of  the  key  concepts  used  when  designing  the  test 
datatase  (see  Chapter  V.)  is  defining  the  descriptors  which 
are  specified  in  the  directory  attributes.  Thus,  we  provide 
a  brief  introducticn  to  the  three  classifications  of 
descriptors.        A    ty^e-A      descriptor   is      a      conjunction    of      a 
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Id 

Desc-Id    Set 

Addr 

C1 

DinE21,D31 

1    A1,A2 

1 

C2 

E14, L22,D32 

A3 

_j 

Figure   4.4        A  Cluster-Eef inition   Table    (QDT) . 

less-thac— 01— equal— tc  predicate  and  a  jrea ter-than— or-egual— 
to  predicate,  such  that  the  same  attrii^ute  appears  in  both 
predicates.  An      example      o2    a      type— A      descriptor      is      as 

follcws: 

{(fCEULATICN    >    10000)     and     (POPULATION    <    15000)). 

A  txie^zl  descriptor  consists  of  only  an  equality  predicate. 
An    exaji'fle    of    a    type-E    descriptor   is: 

(FILE    =    Census) . 

Finally,  a  type-^C  descriptor  consists  of  the  name  cr  an 
attritute.  The  type-C  attribute  defines  a  set  of  type— C 
sub— descriptors.  Tyje— C  su t-descriptors  are  equality  predi- 
cates defined  over  all  unique  attribute  values  which  exist 
in  the  database.  Fcr  example,  the  type-C  attritute  CITY 
forms    the    type-C   sub-descri f tors 

(CIIY=Cuniberland)  ,     (CITY=Columbus) 

where  "Cumberland"  and  "Columbus"  are  the  only  unique  data- 
base   values    for    the    CITY, 

C,       lEE    lECCESS    STEDCIDEE 

Currently,         MDB5      does      not      communicate      with      a      host 
machine.    The  absence    cf    this   communication    requires    that    the 
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Figure   4.5        The    MEBS   Process    Structure. 
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test  interface  process,  the  process  used  to  interact  with 
MDBS/  te  placed  in  the  MDBS  Controller.  The  current  inple- 
nientation  cf  MDBS  does  not  utilize  a  broadcast  bus.  Instead, 
MDBS  utilizes  parallel  ccmmunica tions  links  (PCLs)  to 
emulate  a  broadcast  bus.  Both  the  controller  and  the  tack- 
ends  ccntair  processes  to  interface  with  the  PCLs  for  inter- 
computer message  passing.  These  processes,  while  necessary 
to  interface  with  the  PCLs,  are  not  part  of  MDBS  and  will 
not  be  discussed  further.  Figure  4.5  provides  an  overview 
cf  the  MEBS  Process  Structure. 

T  •  IhS   ££2C€ss€s  cf  th e  Cent  roller 

The  controller  is  composed  of  three  processes: 
Request  Preparation,  Insert  Information  Generation,  and  Pest 
Processirg.  Request  Preparation  receives,  parses  and 
formats  a  request  (transaction)  before  sending  the  fermatted 
request  (transaction)  to  the  Directory  Management  process  in 
each  backend.  Insert  Information  Generation  is  used  to 
provide  additional  information  to  the  backends  when  an 
insert  request  is  received.  Since  the  data  is  distributed, 
the  insert  cnly  occurs  at  one  of  the  backends.  Thus  it  must 
determine  the  backend  at  which  the  insert  will  occur,  alcng 
viith  the  cluster  and  descriptor  ids  for  the  insert.  Post 
Processing  is  used  to  collect  all  the  results  of  a  request 
(transaction)  and  fcrward  the  information  back  to  the  host 
computer . 

2  .   The  Pr oc esse s  cf  Ea ch  Backend 

iach  backend  is  also  composed  of  three  processes. 
They  are  of  course  different  from  the  controller  processes. 
Ihey  are:  Directory  Management,  Concurrency  Contrel,  and 
Record  Processing.  Directory  Management  performs  Descriptor 
Search,  Cluster  Search,  and  Address  Generation,  Descriptor 
Search  determines  the   descriptor  ids  that  are   needed  for  a 
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request.  Cluster  Search  iinds  the  cluster  ids.  Address 
Generation  determines  the  secondary  storage  addresses  r.eces— 
sary  to  access  the  clustered  records.  Concurrency  Control 
deterirines  when  the  request  can  be  executed.  Reccrd 
Processing  ferforms  the  operation  specified  by  the  request. 

E.   lEE  HDE£  MESSAGE  lYPES 

In  this  section  we  describe  the  MDBS  message-passing 
facilities  first  described  in  [Eef.  13].  In  the  MIBS 
message— passing  facilities  there  are  31  message  types  and 
one  general  message  format  (shown  in  Figure  4.6)  .   This  same 


Message  Type       (a  numeric  code) . 

Message  Sender     (a  numeric  code). 

Message  Receiver   (a  numeric  code) . 

Message  Text       (an  alphanumeric  field 

terminated  by  an  end 
of  message  marker) , 


Figure  4.6   The  General  Message  Format. 

format  is  used  for  each  of  the  three  message  passing  facili- 
ties, namely,  messages  within  the  controller,  messages 
withir  a  tackend,  and  messages  between  computers.  Messages 
between  computers  are  divided  into  two  classes,  iressaces 
between  tackends,  acd  messages  between  the  controller  and 
the  backends.  Figure  4.7  describes  each  of  the  MDBS  message 
types.  Figure  4.8  describes  the  abbreviations  used. 
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Figure  ^.7    MDBS  Message  Types 
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SCURCE  OR  DESTINATION  DESIGNATION 

PATH  DESIGNATION 

HCST 
RECE  . 
IIG   : 
PE 

Dn 

RECE 
CC 
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:  ECST  PROCESSING 

:  EIRECTCRY  MANAGEMENT 

:  RECORD  PROCESSING 

:  CCNCURRENCY  CONTROL 

H  , 
C  . 

c 
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B  . 
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:   HOST 

:   CONTROLLER 

:   CONTRCLIER 

CONTROLLER 
:   A  BACKENE 

A  BACKENE 
:   A  BACKENE 

Figure   4.8        MDBS    Message   Abbreviations. 

Ccnmiuriicatiori  between  comfuters  in  MDBS  is  achieved  by 
using  a  tiEe-divisicE— mult iplexed  bus  called  the  parallel 
communication  link  (ECL)  [Eef.  14].  MDBS  contains  a  soft- 
ware interface  to  this  bus  for  each  computer  consisting  of 
two  ccflf limentary  processes.  The  first  process,  get-pcl, 
gets  messages  from  other  coaputers  off  the  PCL.  The  second 
^Tocess,  put-pel,  puts  messages  on  the  bus  to  te  sent  to 
other  ccnputers.  The  controller  and  each  backend  have  their 
own    get-pcl   and    put-pel   processes. 

In  the  remainder  cf  this  section,  we  give  short  descrip- 
tions cf  the  definitions  of  MEBS  messages.  These  defini- 
tions   are    of  the   forn: 

(message-type      number)      message— type      name:      explanation      of 
message. 

The  descriptions  will  be  given  by  the  process  that  receives 
the  message.  These  descriptions  are  in  following  figures: 
Request  Preparation  (Figure  4.9),  Post  Processing  (Figure 
4.10),  Directory  Management  (Figure  4.11),  Record  Processing 
(Figure  4.12),  Concurrency  Control  (Figure  4.13),  Host 
processed  for  Test  Interface  (Figure  4.14),  and  Insert 
Infornaticn   Generation    (Figure   4.15). 
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(1)  Kcst  Traffic  Urit  :  The    traffic  unit  represents  a 
single  re-iuest  or  transaction  from  a  user  at  the 
host  machine. 

(13)  Eecord  that  has  Ciianged  Cluster  :  Thrs  message  is 
a  lecord  which  has  changed  cluster,  Request 
Irecaration  viill  prepare  ±t   as  an  insertion  and 
send  it  to  the  backends. 

(2S)  No  More  Generated  Inserts  :  This  message  indicates 
that  all  the  records  that  have  changed  cluster  as 
a  result  of  ar  update  request  have  Been  sent  to 
Request  Preparation. 

(14)  Results  of  a  Petch  or  Retrieve  Caused  by  an  Update: 
Ihis  message  carries  the  information  from  a  fetch 
cr  retrieve  hack  to  Request  Preparation  to  complete 
an  update  with  a  type-Ill  or  a  type— IV  modifier. 


Figure  4.9   Bequest  Preparation  Messages, 


(3)  Numher  of  Requests  in  a  Transaction  :  Request 
Preparation  sends  to  Ecst  Processing  the  numter 
of  requests  in  a  traffic  unit.  This  enables  Post 
Processing  tc  determine  whether  the  processirg  of 
a  traffic  unit  is  complete. 

(4)  Aggregate  Operators  :  Request  Preparation  sends 
the  aggregate  operators  to  Post  Processing. 

(5)  Requests  with  Errors  :  Re-^uests  with  errors  will 
he  found  in  Bequest  Preparation  by  the  Parser  ard 
sent  to  Post  Processing  directly.  Post  Processirg 
Kill  send  requests  with  errors  back  to  the  host, 

(11)  Results  cf  a  Request  from  a  Backend  :  This  itessage 
contains  the  results  that  a  specific  backend  found 
for  a  request. 

(12)  Aggregate  Operator  Results  from  a  Backend  :  When 
an  aggregate  operation  needs  to  be  done  on  the 
retrieved  records,  each  backend  will  do  as  much 
aggregation  as  possible  in  the  aggregate  operation 
function  of  Record  Processing.   This  message 
carries  those  results  to  Post  Processing. 


Figure  4.10   Post  Processing  Messages 
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(6)  farsGd  Traffic  Unit  :  This  is  the  prepared  traffic 
unit  sent  by  Bequest  Preparation. 

(2S)  No  More  Generated  Inserts  :  This  message  indicates 
that  insert  request  for  ail  the  records  that  have 
changed  cluster  as  a  result  of  an  update  request 
have  beer  generated  and  sent  to  Directory 
Management. 

(7)  New  Descriptor  Id  :  This  message  is  a  response  to 
the  Directory  Management  request  for  a  new 
descriptor  id. 

(£)  Eackend  Number  :  This  message  is  used  to  specify 
which  bacJcend  is  to  insert  a  record. 

(15)  Descriptor  Ids  ;  This  nessage  contains  the  results 
of  descriptor  search  by  Directory  i^anagement. 

(IS)  Cld  and  New  Values  of  Attribute  being  Modified  : 
Eeccrd  Processing  uses  this  message  to  check 
whether  a  record  that  has  been  updated  has  changed 
cluster. 

(31)  An    Update  Request  has  finished  :  Record  Processing 
signals  Directory  Management  that  an  update  request 
has  finished  execution. 


figure  4.11    Directory  Management  Messages. 


(16)  Request  and  Disk  Addresses:  Thxs  message  contains  a 
request  and  disk  addresses  for  Record  Processing  to 
come  up  with  the  results  for  the  request. 

(17)  Changed  Cluster  Response:  Directory  Management  uses 
this  message  to  tell  Record  Processing  whether  an 
updated  record  has  changed  cluster. 

(2S)  Nc  More  Generated  Inserts  :  This  message  indicates 
that  all  insert  requests  generated  as  a  result  cf 
an  update  request  nave  been  sent  to  Record 
Processing. 

(18)  Fetch  :  Fetch  is  a  special  retrieval  of  informaticn 
for  Request  Preparation  due  to  an  update  request 
uith   type— IV  moaifier. 

Figure  U-12   Record  Processing  Messages. 
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Figure  4.13    Concurrency  Control  Messages. 
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(2)  Bequest  Results  :  Contains  the  results  for  a  request 
after  ceing  ccllected  from  ail  the  tackends  and 
acgregated,  if  Decessary. 


Figure  4.1i*    Host  Messages, 


(S)  Cluster  Id  :  lirectcry  Management  sends  a  cluster 
id  to  Insert  Informaticr  Generation  for  an  insert 
leguest.   IIG  will  decide  where  to  do  the  insert. 

(10)  Bequest  for  New  Descriftor  Id  :  When  Directory 

Maragement  has  found  a  new  descriptor,  it  is  sent 
to  Insert  Information  Generation  to  generate  an  id. 


Figure  4.15    Insert  Information  Generation  Messages 


E.   1EI  FXECDTION  OF  A    RETRIEVE  REQUEST 

In  this  section,  we  descrite  the  sequence  bf  acticrs  for 
a  retrieve  request  as  it  moves  through  MD3S.  The  sequence  of 
actions  will  be  described  in  terms  of  the  messages  passed 
between  the  MDBS  processes:  Request  Preparation  (EEQf) , 
Insert  Information  Generation  (IIG),  Post  Processing  (PP)  , 
Eirectory  Management  (DM) ,  Record  Processing  (RECF)  and 
Concurrency  Control  (CC) .  For  completeness,  we  descrite  the 
actions  which  require  data  aggregation. 

First  the  retrieve  request  comes  to  REQP  from  the  host. 
In  the  present  i npleaentaticn,  it  comes  from  the  controller. 
REQP  sends  two  messages  to  PP:  the  number  of  requests  in  the 
transaction  and  the  aggregate  operator  of  the  request.  The 
third  message  sent  by  REQP  is   the  parsed  traffic  unit  which 
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goes    to    EM    in    the   backends.  DM    sends    the    type— C    attritates 

needed  ty  the  request  to  CC.  Since  type-C  attributes  may 
create  nev  type— C  sut-descriptors,  the  type— C  attributes 
must  ie  locked  ty  CC.  Once  an  attribute  is  locked  and 
descriptor  search  can  be  performed,  CC  signals  DM.  EM  will 
then  perfori  Descriptor  Search  on  m/n  predicates,  where  m 
is  the  nuaber  of  predicates  specified  in  the  query,  and  n  is 
the  number  of  backends,  DM  then  signals  CC  to  release  the 
lock  en  that  attribute.  DM  will  broadcast  the  descriptor  ids 
for  the  request  to  the  other  tacKends.  DM  now  sends  the 
descriptcr-id    groups      for   the      retrieve    request      to    CC.  A 

J^scriptcr^id  group  is  a  collection  of  descriptor  ids  which 
define  a  set  of  clusters  needed  by  the  request. 
Descriptcr-id  groups  are  locked  by  CC,  since  a  descriptcr-id 
group    may      define   a      rew   cluster.  Once    the      descriptor-id 

groups  are  locked  and  Cluster  Search  can  be  perfcraed,  CC 
signals  EM.  DM  will  then  perform  Cluster  Search  and  signal 
CC  to  release  the  locks  on  the  descriptor— id  groups.  Next, 
DM  will  send  the  cluster  ids  for  the  retrieval  to  CC.  CC 
locks  cluster— ids,  " since  a  new  address  may  be  specified  for 
an  existing  cluster.  Once  the  cluster  ids  are  locked,  and 
the  request  can  proceed  with  Address  Generation  and  tne  rest 
cf  the  request  execution,  CC  signals  DM.  DM  will  then 
perfoim  Address  Generation  and  send  the  retrieve  request  and 
the  addresses  to  RECf.  Once  the  retrieval  has  executed  prop- 
erly, EECP  will  tell  CC  that  the  request  is  done  and  the 
locks    en      the  cluster  ids      can    be   released.  The    retrieval 

results  are  aggregated  by  each  backend  and  forwarded  to  EP. 
FP  ccflipletes  the  aggregation  after  it  has  received  the 
partial    results    from   every    fcackend.  When   PP   is    done,       the 

final   results    will    be   sent    to    the    user. 
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IV.  AN  APPIICAIIOH  OF  THE  METHODOIOGIES  TO  MDBS 

Id  th€  previous  chapters  we  discussed  the  separate 
topics  of  nethodolocies  for  doing  internal  and  external 
perxcriraEce  measurements  of  database  systems  and  the 
Kulti-hackend  Database  System  (MDBS).  This  chapter  presents 
the  application  cf  tiese  methodologies  to  MDBS.  The  initial 
discussion  concerns  modification  to  the  ilDBS  software.  We 
discuss  the  decisions  made  during  implementation,  modifica- 
tion cf  the  user  interface  process,  the  bacKend  processes 
and  tie  ccntroller  processes,  and  the  issues  resolved  during 
implementation.  The  next  discussion  centers  on  the  modifi- 
cations cf  the  lADBS  test  environment,  which  includes  test 
environment  changes  and  software  tools.  The  final  discussion 
identifies  measurement  programs  that  were  inplemented 
outside  cf  the  MDBS  environment- 

A.   1EE  HOD-IFICATION  CF  THE  MDBS  SOFTWARE 

In  this  section,  we  begin  by  presenting  the  decisions 
made  ccncerning  the  iirplementation  of  internal  and  external 
performance  measurements  on  MDBS.  Next,  we  discuss  the  modi- 
ficaticns  cf  the  user  interface  and  the  individual  MDBS 
processes,  Ke  conclude  this  section  by  relating  issues  which 
are  resclved  during  the  i nplementation  of  the  performance 
measurement  methodolcgy, 

l  •   Icplementaticn  Deci sions 

When  designing  and  specifying   internal  and  external 

perfcrmance  measurement  methodologies,    decisions  irust   be 

made   as  to  the  most   advantageous  positions   to  place   the 

checkpoints,  data  collections  and  data  aggregations.    These 
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decisicLS  are  based  cu  the  need  to  minimize  system  overhead, 
and  tc  provide  the  a|:propriat€  level  of  detail  of  the  test 
data  cttained.  Primitives  and  data  structures  must  he  devel- 
oped which  will  allow  the  measurement  programs  to  re&ain 
extensible  and  which  are  compatible  with  existing  system 
software.  A  user  interface  must  be  developed  which  is  easy 
to  use,  should  not  require  the  user  to  possess  any  special 
knowledge  of  the  interface  in  order  to  use  it  and  shouli 
maintain  data  in  machine  readatle  form  which  will  ailcw  for 
future  exparsion  of  the  performance  measurement  system. 

Ihe  following  implementation  decisions  are  within 
the  bounds  of  two  constraints  placed  upon  us  by  the  current 
implementation  of  MDES  along  with  two  constraints  we  placed 
en  ourselves.  The  first  constraint  concerns  the  virtual 
memory  available  to  tie  processes  resident  on  the  backends. 
The  operating  systen  on  the  f DP— 1 1/44  allocates  a  virtual 
memory  of  64  Kbytes.  Each  of  the  MDBS  bacJcend  processes  must 
fit  into  a  virtual  memory  of  this  size.  The  additional  soft- 
ware added  as  a  result  of  performance  measurement  has  to  be 
constructed  so  that  it  will  fit  in  a  the  very  limited  memory 
space  remaining  in  eacn  backend  process.  The  second 
constraint  concerns  the  initial  MDBS  design  reguir eaents 
which  called  for  a  broadcast  bus  between  minicomputers. 
Currently  a  Parallel  Communications  Link  (PCL)  is  teing 
employed  as  the  inter— com  puter  message— passing  mechar.ism. 
Messages  passed  over  the  PCL  are  sequentially  transmitted 
from  the  sender  to  tie  receiver.  This  difference  in  opera- 
tion tetween  the  PCI  and  the  broadcast  bus  must  be  taken 
into  acccunt  in  cur  attempt  to  validate  the  claims  cf  KDES. 
Additional  performance  measurement  programs  must  also  be 
writter  to  measure  message-passing  times  on  the  PCL. 

The  third  constraint,  i.e.,  minimizing  overhead, 
significantly  influences  our  performance  measurement  design. 
This  subject  will  be  discussed   in  the  following  paragraphs. 
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Ihe  final  ccnstraint  deals  with  cur  desire  to  ran  MDBS  unim- 
peded ty  the  new  performance  measurement  software.  Ivhen  we 
are  net  evaluating  the  system,  we  want  to  te  aile  tc  run 
MDBS  with  no  overhead  incurred  by  the  additional  frcgrams 
and  checkpoints  of  tie  performance  measurement  system.  This 
is  accoof lished  by  hracketing  all  performance  measurement 
software  within  special  preprocessor  instructions  which 
allow  us  to  include  cr  omit  the  performance  measurement 
software  during  program  compilation.  A  definition  file  is 
created  containing  flags  which  are  used  to  determine  the 
sections  of  performance  measurement  code  to  be  compiled.  By 
compiling  separate  versions,  we  then  have  the  capability  of 
running  MLES  without  performance  measurement  overhead  or 
with  the  overhead  introduced  when  we  select  certain  pcrticns 
of  the  performance  measurement  software  for  compilation. 

Ccmmunication  in  MDBS  is  accomplished  by  passing 
messaces.  Processes  which  are  resident  in  the  same  lEiciccni— 
puter  ccmmucicate  by  using  inter-process  messages,  while 
processes  resident  in  different  minicomputers  communicate  by 
using  inter-computer  messages.  Actions  taken  by  the  various 
processes  in  MDBS  are  initiated  by  the  receipt  of  a  messace. 
Actions  end  when  that  message  has  been  processed  and  any 
resultant  messages  have  been  sent.  As  a  message  is  received 
by  a  process,  the  action  taken  by  the  process  is  dependent 
en  the  message  origination  and  type.  The  general  MIBS 
process  procedure  hierarchy  is  shown  in  Figure  5.1. 

The  highest  level  of  this  process  is  the  main  proce- 
dure. Ihis  procedure  receives  the  next  message  and  based 
upon  the  originator  cf  the  lessage,  calls  a  sub  procedure  in 
the  procedure  hierarchy.  The  message  works  its  way  down  this 
tree  of  sut  procedures  based  upon  the  originator  of  the 
message  and  the  message  type.  Ultimately,  the  message 
arrives  at  a  message-handling  procedure  (message  handler). 
The  icessage  handler  has  the  responsibility  of  processing  the 
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Figure  5.1    The  MDBS  Procedure  Hierarchy. 

message.  In  doing  sc,  it  may  call  other  procedures  lower  in 
the  hierarchy.  MDBS's  message  oriented  approach  naturally 
lends  itself  to  checkpoint  placement  at  this  level. 
Selection  of  measureaent  at  this  level  provides  the  user 
with  sufficient  processing  details  while  not  overburdening 
the  user  with  excessive  information.  A  range  of  si:!c  to 
twelve  checkpoints  maj  be  installed  in  each  MDBS  process  at 
this  level.  The  general  approach  to  the  installation  of 
checkpoints  is  shown  in  figure  5.2.  In  this  installaticn,  we 
insert  checkpoints  both  before  and  after  every  message 
handler.  As  a  result,  we  obtain  the  time  of  entry  into  the 
procedire  and  exit  from  the  procedure.  The  differences 
between  these  times  is  the  time  it  takes  to  process  the 
message. 
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Figure  5.2   The  MDBS  Procedure  Hierarchy  with  Checkpoints. 

Measuring  at  this  level  presents  one  problem.  Ihe 
system  clocks  are  not  sufficiently  refined  for  the 
processing  speed  of  the  message— handling  routines.  The  clock 
on  the  PEP- 11/44  measures  time  in  discrete  time  intervals  of 
only  cne  sixtieth  of  a  second.  The  clock  on  the  Vax-1  1/780 
measures  time  in  discrete  time  intervals  of  only  cne 
hundredth  of  a  second.  In  any  given  time  interval,  the 
system  time  may  be  accessed  by  the  performance  measurement 
software.  This  means  that  access  may  occur  exactly  when  a 
time  interval  is  recorded  by  the  system  clock  or  anywhere  in 
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tetween  the  recording  of  a  time  interval  by  the  system 
clock.  Because  of  this  condition,  the  variance  of  the  tiie 
measurement  would  he  approxinately  twice  the  smallest 
interval.  This  variance  is  significant  when  it  is  ccmpaied 
to  the  time  it  takes  a  message-handling  routine  to  process  a 
message.  A  method  must  be  developed  to  reduce  this  variance. 
Ihe  scluticn  is  to  send  multiple  requests  to  the  message- 
handling  routine  being  timed,  to  record  the  time  for  each 
request  and  then  tc  compute  the  average  of  the  reccrded 
timeS/  thereby  obtaining  a  mere  accurate  measurement  of  the 
true  processing  time. 

In  order  to  keep  overhead  to  a  minimum  and  tc  keep 
the  performance  measurement  system  extensible  and  siirple, 
we  decide  tc  place  ninimal  performance  measurement  software 
in  ar  MEES  process.  Kc  processing  of  test  data  is  dene  in  an 
MDBS  piocess.  All  test  data  is  sent  to  the  test-interface 
routines  for  aggregation  and  storage.  Since  i^DES  is  a 
message— tased  system,  measurement  control  messages  and  test 
data  are  transferred  as  messages  utilizing  existing  MEBS 
communica ticns  routines.  A  differently-oriented  system,  such 
as  procedure— oriented,  would  require  a  different  approach  to 
measurement  software  communication. 

Ihe  installation  of  the  checkpoints  requires  that  a 
method  be  devised  to  collect  the  information  obtained  by  the 
checkpoints.  The  information  could  be  stored  locally,  trans- 
ferred tc  a  central  storage  location  in  the  minicomputer  or 
sent  to  the  test  interface  for  storage.  In  order  tc  reduce 
the  system  overhead  introduced  by  message  passing  we  deter- 
mine that  the  temporary  local  storage  of  data  would  be  most 
efficient.  As  pointed  out  previously,  one  of  the  constraints 
placed  upon  the  implementation  of  performance  measuremect  is 
the  virtual  memory  space  available  at  the  backends.  Storage 
of  the  test  data  generated  from  the  checkpoints  would  have 
to  be  large  enough  tc  contain  sufficient  timing  information 
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and  small  enough  tc  reside  in  the  constrained  virtual 
aemorj  sface  available  to  a  hackend  process.  For  our  timing 
measurements,  the  upjer  bound  en  the  number  of  requests  sent 
to  a  lessage  handler  at  one  time  is  fixed  at  one  hundred.  In 
ether  words,  we  assume  tnat  measuring  a  given  function  mere 
than  100  times  will  let  provide  a  statistically  significant 
difference  over  measuring  that  same  function  exactly  100 
times.  Giver  this  upper  bound,  we  decide  that  a  static  array 
ef  2CG  records  would  be  small  enough  to  fit  in  the  virtual 
memory  of  a  backend  process,  yet  large  enough  to  held  a 
sufficient  amount  of  test  data.  Figure  5.3  shows  the  general 
approach  to  the  placement  of  the  performance  measurement 
routine  (Timer)  which  is  called  by  the  checkpoint,  accesses 
the  system  clock  and  nanages  the  static  array. 

flncther  question  that  must  be  answered  is  the  manner 
in  which  tie  checkpoints  are  activated.  Should  we  activate 
only  one  checkpcint  at  a  time  or  multiple  checkpoints  at 
cncer  We  determine  that  activating  more  than  one  checkpoint 
at  a  time  cculd  intreduce  error  into  tne  measurement.  If  one 
routice  (A)  which  is  being  measured  called  another  routine 
(B)  which  is  also  being  measured,  the  time  necessary  to  do 
timing  measurements  er.  (B)  would  increase  the  total  running 
time  of  (A).  Because  of  this  we  only  allow  the  measurement 
of  one  routine  at  a  time. 

Ihe  desire  tc  provide  a  user  interface  which  is  easy 
to  use  and  requires  no  particular  knowledge  of  test  inter- 
face iif lenentation  leads  us  to  develop  a  menu— driven 
system.  Ihe  modularity  of  the  performance  measurement  design 
lends  itself  to  easy  access  via  menus.  The  menu-driven 
system  is  also  compatible  with  the  existing  test  interface 
systeir. 

Ihe  final  problem  is  how  to  process  and  stcre  the 
raw  test  data  received  from  the  various  processes,  We 
require   that  the   user  have   access   to  both   raw  data   and 
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Figure  5.3   The  Procedure  Hierarchy 
with  Checkpoints  and  Timer. 


summarized  inf ormaticr.  Also,  we  reguire  that  the  data  be 
available  for  further  nachine  processing.  Ihese  problems  are 
eliminated  by  maintaining  all  collected  data  in  files.  Wten 
raw  data  is  received  from  a  process  it  is  immediately  stored 
in  a  file.  Once  all  requests  which  are  to  be  timed  have 
finished,  the  file  ccntaining  the  raw  data  is  accessed  and 
processed  to  produce  another  file  containing  sunmarized 
information  on  the  various  message— handling  routines  which 
have   been   measured.    A  history   of   this   information   is 
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compiled  as  the  undeilyiiig  operating  system  (  on  the  niiEi— 
compu-ter  vihere  the  ccEtroller  software  resides  )  creates  new 
versicns  of  these  files  each  time  the  measurement  programs 
are  iEvo]<€d. 

2 .   Ihe  Modifications  of  the  User  Interface 

Ericr  to  the  iaplement ation  of  the  performance  creas— 
urement  nethodolcgies,  the  test  interface  process  consisted 
of  those  programs  necessary  to  generate  a  test  datatase, 
load  a  test  database  and  execute  requests  against  the  test 
datatase.  The  implenenta ticn  of  performance  measurement 
software  within  the  existing  software  structure  of  the  test 
interface  is  accomplished  ty  expanding  the  existing  hier- 
archy cf  control  and  by  integrating  performance  measurement 
software  with  existirg  test  interface  software.  Figure  5.4 
shows  the  test  interface  procedural  hierarchy  with  the 
performance  measurement  modifications. 

The  user  selects  actions  to  be  performed  hy  trav- 
ersing a  tree.  At  each  node,  a  decision  is  made  as  to  the 
path  to  follow.  By  following  certain  paths,  the  user  has  the 
capability  to  generate  a  database,  load  a  datatase  or 
execute  the  test  interface.  When  the  user  decides  to  execute 
the  test  interface,  a  decision  is  then  made  as  to  what  path 
to  follow  en  the  test  interface  sub— tree.  The  user  may 
choose  a  new  database  to  work  with,  create  a  new  list  of 
traffic  units,  modify  an  existing  list  of  traffic  units, 
select  traffic  units  from  an  existing  list  for  execution, 
select  an  existing  list  so  that  all  traffic  units  en  the 
list  may  be  executed,  display  the  results  of  external  meas- 
urement cr  perform  a  combination  of  internal  and  external 
performance  measurement.  The  user  may  traverse  the  tree  at 
will  moving  up  and  dcwn  the  tranches  to  accomplish  a  wide 
variety  cf  tasks. 
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The  MDBS  software  in  the  test  interface  contains  a 
procedure,  called  by  the  other  MDBS  procedures,  to  execute  a 
request  (transaction).  That  is,  to  forward  a  ret^uest  (trans- 
action) to  Bequest  Preparation  for  processing.  This  proce- 
dure is  selected  fcr  the  placement  of  the  external  and 
internal  performance  measuring  software  necessary  to  time 
and  aanijulate  requests.  External  measurements  are  taken 
from  this  procedure  immediately  before  the  request  is  sent 
to  Request  IreparcCticn  for  processing  and  after  the  results 
are  returned  from  Post  Processing.  Software  is  added  to  this 
procedure  to  generate  requests  to  the  MDBS  processes  which 
initialize  the  message— handling  routines  for  internal 
perfcrnance  measurenent ,  generate  multiple,  identical 
requests  in  order  to  reduce  the  timing  variance  (as  previ- 
ously discussed)  and  to  generate  the  test  data  collection 
message.  The  number  of  multiple  requests  to  generate  is 
provided  tc  this  routine  by  a  variable  defined  at  compile 
time.  This  procedure  receives  the  information  necessary  to 
accomplish  these  other  tasks  by  sharing  a  f irst-in— las t— cut 
stack  and  a  pointer  to  the  top  of  the  stack  with  the 
performance  measurement  software.  The  evaluator  interacts 
with  the  performance  measurement  software  to  build  a  stack 
of  internal  performance  measurement  requests.  This  procedure 
then  draws  from  that  stack,  initializes  the  message- 
handling  routine  selected  by  the  evaluator,  generates 
multiple  ccpies  of  tte  MDBS  request  selected  by  the  evalu- 
ator, and  generates  the  request  necessary  to  collect  the 
test  data  from  the  process  which  contains  the  message- 
handler  being  evaluated.  Figure  5.5  shows  the  relationship 
between  this  procedure  and  the  performance  measurement  soft- 
ware and  its  data  structures. 

In  addition  to  external  system  timing,  ether 
performance  measurement  functions  provided  by  the  new  soft- 
ware include   routines  which   allow  the   user  to   1)   select 
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Figure  5.5    Ihe  Belationship  of  the  Reguest  Executicn 
and  the  Performance  Measurement  Interface. 


specific  MEBS  message— handling  routines  to  be  timed,  2) 
select  all  nessage— handling  routines  within  a  process  to  be 
timed,  3)  restrict  the  timing  of  backend  message— har cling 
routines  tc  a  specific  backend  or  backends  and  4)  perform 
any  ccmtination  of  the  aforementioned  selections.  The  new 
performance  measuremert  software  also  includes  routines  to 
contrcl  the  tilling  software  within  the  MDBS  processes, 
collect  raw  data  transferred  to  the  test  interface  from 
processes  within  MDBS,  process  the  raw  data  into  summarized 
form  and  stcre  the  data  for   future  use.   Other  routines  are 
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intrcduced  into  existirg  test  interface  software  to  aid  in 
the  iiessage-passing  require  aents  of  the  performance  measure— 
lent  system. 

3 .   Ihe  Modification  of  Indi vidual  Processes 

Ihe  PCI  processes  within  MDBS  are  modified  to  pass 
perfcrmance  measurement  messages.  All  of  the  remaining  MEBS 
processes  received  identical  nicdif ica tion.  The  send/receive 
porticn  of  every  process  is  modified  to  include  the  capa- 
bility cf  processicg  performance  measurement  messages. 
Send/receive  is  used  for  inter-process  message  passing, 
ChecXpciEts  are  flaced  in  the  MDBS  processes  at  the  message- 
handling  (lew)  routine  level,  A  timer  routine  is  placed  in 
each  process  which  receives  control  messages  from  the  test 
interface.  An  initialization  message  causes  the  timer 
routine  to  initialize  the  data  collection  array  to  zero  and 
turn  en  a  selected  checkpoint.  As  MDBS— generated  messages 
pass  through  a  check^cint,  the  timer  routine  is  called.  The 
timer  routine  accesses  the  system  clock  and  stores  the 
messace  type  and  time  in  an  array.  A  completion  message  from 
the  test  interface  causes  the  routine  to  transmit  the  data 
collected  in  the  array  to  the  test  interface  and  to  turn  cff 
the  checJtpcint  which  is  timed.  Figure  5.6  shows  the  modifi- 
cations made  to  the  directory  management  process  as  an 
example  cf  the  i nplenentation  of  the  general  modifications, 
shown  in  figure  5.3. 

^'      Issues  Resolved  During  the  Implementation 

KEBS  is  an  experimental  database  machine.  As  such, 
it  is  under  constant  modification  and  subject  to  use  by  many 
systeir  developers.  Tie  MDBS  software  engineering  envircrment 
requires  that  versions  be  used  to  control  program  modifica- 
tion, but  it  is  impractical  to  create  new  versions  cf  MEBS 
every  tiEe   a  single  p rogra u   is  modified.   One   solution  we 
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Figure  5.6        Tbe   Directory    Management   Hierarchy 
with  CheckpoiDts/Sof tware. 


inpleniented  is  the  uaintainance  of  an  in— use  file.  When 
somecne  desired  to  modify  a  program,  the  program  is  copied 
to    the    developer's    private    work    space.         The    developer    ccakes 
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aPi  entry  in  the  in— use  file  which  indicates  who  is  currently 
modifying  that  particular  program.  This  method  allcwec  the 
developer  tc  modify  a  program,  compile  and  test  the  modifi— 
cation  ir.  an  environment  away  from  the  main  HDBS  envircr.ment 
(  in  order  to  avoid  ccupromising  a  functional  system  )  and 
return  the  program  to  MDBS  upon  completion  of  testing.  This 
Eethod  avoids  the  possibility  of  two  developers  concurrently 
modifying  the  same  picgram  and  the  ensuing  problems.  Machine 
time  is  also  at  a  premium.  There  is  no  easy  soluticn  for 
this.  Much  of  the  measurement  oust  be  conducted  during  ncn— 
peak  hours  such  as  late  evening  and  weekends.  This  is  neces- 
sitated by  the  requirement  that  tne  measurement  of  MDES  be 
accomflished  in  a  stand-alone  environment.  Since  the  MDBS 
controller  is  i nplemented  on  a  time— sharing  system,  the 
entire  machine  has  to  be  reserved  for  performance  testing  so 
that  MDBS  could  be  run  in  isolation. 

The  performance  measurement  system  places  additiotal 
demands  on  MDBS  systei  message— passing  software.  Except  for 
one  case,  the  system  responds  without  protest  to  this  unex- 
pected load.  The  message— processing  routines  of  the  MEBS 
tackends  are  not  designed  to  handle  the  transfer  of  200 
internal  performance-measurement  messages  from  a  tackend 
process  to  the  controller.  There  is  not  sufficient  space 
available  to  store  the  pointers  required  to  access  this  many 
messages.  The  MDBS  programs  are  easily  extended  to  account 
for  this  change  in  message  traffic. 

The  MDBS  coEtroller  resides  on  a  VAX-1  1/780  which 
operates  under  a  time— sharing  mode.  When  inter-computer 
messages  are  passed  cr  the  PCL,  the  operating  system  expects 
a  confirmation  withir  a  certain  time  interval.  While  no 
problems  occur  during  the  normal  operation  of  MDBS,  the 
large  message  traffic  from  the  backends  to  the  controller 
during  internal  measurement  require  more  time  than  that 
alloted   to  the   ccrtrollei   during    its   quantum   on   the 
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"VAX-1V7£0.  The  result  is  that  the  controller  processes  on 
the  VAX  are  suspended  while  the  tackend  is  stili  transnaitin^ 
over  the  FCL.  When  the  PCL  receives  no  response  it  signals 
an  error  and  aborts.  Obviously,  this  is  not  a  prctlem  when 
the  MEBS  system  runs  stand  alcne.  However,  such  abortion 
does  provide  more  than  an  inconvenience  during  the  inplemen— 
taticn  of  the  performance  measurement  system. 

Currently,  MC£S  utilizes  two  different  type  of  lini- 
cooputers.  This  translates  into  two  different  oi:erating 
systens,  two  different  text  editors,  two  different  compilers 
and  two  different  system  clocks  wnich  record  tiines  in 
differing  units.  Because  of  this,  performance  prograis  in 
the  ccntioller  processes  and  the  backend  processes  are  not 
identical.  Different  access  mechanisms  for  system  timers 
must  te  developed  and  a  routine  mast  be  developed  to  convert 
the  times  received  from  the  tacKends  into  the  equivalent 
time  units  of  the  controller.  Additional  time  and  effort 
are  required  to  become  sufficiently  knowledgeable  on  the  two 
systems  in  order  to  begin  implementation"  of  the  perfcrirance 
measurement  methodolcgy, 

E-  111    CODIFICATION  CF  THE  M£ES  TEST  ENVIRONMENT 

In  conducting  performance  ireasurements ,  one  demands  that 
all  the  measurements  he  consistent  as  well  as  reproducible. 
There  should  be  no  inconsistent,  unexplainable  results. 
Further,  the  results  should  he  reproducible  with  re-runs. 
This  section  discusses  the  necessary  changes  in  the  test 
environment  to  insure  consistency  and  r eproduciblity .  Then 
we  present  the  software  tools  used  to  make  the  testing 
easier  ard  smoother. 
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1  •   Necessar  7  Ch ar^es  to  the    Test  2n vi ronmen t 

Ihe  methodolcgies  for  internal  and  external  perform- 
ance ireasurements  on  a  datatase  system  have  one  prerequi- 
site. Ihe  results  j\jst  not  be  accidental.  These  results 
need  to  he  consistent  and  reproducible.  To  achieve  consis- 
tency and  reproducitlity,  ve  must  be  able  to  control  the 
test  environment.  Ivery  scientific  experiment  requires  the 
test  environment  to  be  controlled,  to  insure  that  all 
factors  effecting  the  experiment  are  known. 

The  experimental  MDBS/  the  system  to  be  tested,  has 
its  ccntrcller  processes  running  under  a  VAX/VMS  environ- 
ment. This  requires  these  processes  of  the  controller  to  be 
run  simultaneously  with  the  other  no n— system  processes  in  a 
timesharing  environment.  Under  tnis  environment,  the 
results  obtained  would  be  erratic  and  inconsistent.  To 
alleviate  this,  several  preliminary  steps  are  taken  prior  to 
final  testing.  The  tests  are  run  stand— alone  with  all  ether 
logins  tc  the  computer  disabled.__  All  processes  are  given 
the  highest  possible  real— time  priority.  Swapping  cut  of 
processes  in  the  wait  state  is  disabled  to  retain  the 
processes  in  the  physical  memory.  Page  faults  are  disabled 
by  increasing  the  working  set  size  to  the  size  of  the  image 
of  each  process-  In  this  way,  the  VAX/VMS  system  appears  to 
the  evaluatcr  as  a  single  user  system. 

2  •   Software  Tools-  for  the  Test  Environment 

An  evaluator  should  understand  the  system  tc  be 
tested,  determine  the  various  parameters  to  be  altered, 
specify  the  various  data  to  be  collected,  and  interpret  the 
results.  Tedious  and  busy  work,  such  as  modifying  the  input 
set  or  the  system  configuration,  can  be  done  manually  and 
are  time— consuming  without  proper  tools.  Nevertheless, 
these  modifications  are  necessary,  and  can  be  automated  by 
using  software  tools. 
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In  [Eef.  9],  software  has  been  provided  tc  generate 
a  datatase  and  a  request  set,  lead  the  datai-dse,  and  run  the 
request  set  against  the  datatase  whicn  can  all  be  used  in 
the  testing  of  MDBS.  This  allows  for  easy  creation  and 
modification  of  the  selected  database  and  requests.  Ihe 
system  software  needs  to  be  mcdified  during  the  testing  to 
accomuodate  such  things  as  changes  in  the  number  of  tackeids 
being  used  by  the  system  and  whether  or  not  internal  testing 
is  tc  be  performed.  lach  change  requires  a  recompilaticn  of 
the  system  software.  To  facilitate  this  change  and  to 
insure  cnly  recompilation  of  necessary  files,  the  Unix 
*mak€'  command  is  used.  Briefly,  execution  of  this  ccmmand 
would  check  a  file  created  by  the  author.  This  file  would 
indicate  all  interde^endenc ies  of  all  files  of  MDBS.  If  a 
file  has  been  changed,  all  ether  files  effected  by  this 
change  will  automatically  be  recompiled  and  relinked  upon 
executicn  of  the  'make*  command.  In  this  manner,  the  system 
could  be  reconfigured  wita  ease  and  with  the  assurance  that 
all  effected  files  are  changed.  Using  these  software  tccls 
for  the  test  environment  and  with  proper  control  of  the  test 
envircnment,  the  tests  are  made  easier  to  conduct  and 
control  and  are  known  to  be  consistent  and  reproducible. 

C.   AEDITICNAL  HEASOBEMENT  SOFTHARS  REQUIREMENTS 

In  crder  tc  ccnpletely  evaluate  MDBS,  the  message 
passing  mechanisms  must  be  monitored  to  determine  the  time 
required  tc  pass  bcth  i rter— computer  and  inter-process 
messaces.  Although  the  measurement  of  these  messages  could 
cccui  during  the  execution  of  MDBS,  the  environment  under 
which  the  messages  are  passed  could  be  more  easily 
controlled  if  the  messages  are  evaluated  outside  cf  the  MIBS 
envircnment.  The  results  of  these  measurements  are  contained 
in  the  next  Chapter  VI. 
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"J  •   iDter-com^uter  Mess  age  Processing  Measurement 

New  software  does  not  have  to  be  developed  to 
measure  the  time  required  to  pass  messages  on  the  PCI. 
Programs  are  provided  by  the  manufacturer  of  the  PCI  which 
measure  the  message— passing  time.  The  evaluator  is  given  the 
capatility  of  specifying  which  node  on  the  PCL  is  to  receive 
the  message,  the  message  length  and  the  number  of  messaces 
to  send.  Ihe  software  generates  and  sends  the  messages,  then 
provides  the  total  time  to  transmit  the  messages  tc  the 
evaluator.  The  PCL  is  implemented  as  a  ring  bus.  Because  of 
this  style  of  i nplementation,  we  decide  to  send  messages 
from  cne  selected  node  to  itself.  The  times  obtained  are  an 
upper  bound  to  the  irter— computer  message  passing  time. 

2  •   Inter-prgcess  Message  Processing  Measurement 

Programs  are  written  for  the  inter— process  message 
processing  measurement.  To  deterjiine  the  time  required  to 
pass  a  message,  we  developed  two  programs.  The  first  program 
gets  the  time,  generates  a  selected  number  of  messages  with 
a  selected  message  length,  and  sends  them  to  a  second 
program  which  receives  the  messages  and  then  gets  the  time. 
We  run  the  first  program  at  a  higner  system  priority  than 
the  second  to  prevent  the  system  from  process  switching 
before  all  the  messaces  have  been  generated.  After  genera- 
tion of  all  messages  by  the  first  program,  we  then  set  the 
system  priority  of  the  sending  program  below  that  of  the 
receiving  program,  thereby  forcing  a  process  switch.  We  can 
then  compute  the  average  time  it  takes  to  pass  a  single 
message  on  the  machine.  To  obtain  a  higher  degree  of  accu- 
racy we  must  account  for  the  time  it  takes  the  system  to 
switch  processes  and  the  time  it  takes  the  system  to  alter 
the  priority  of  the  sending  process.  Programs  are  written  to 
account  for  these  times.   The  program  written  to  account  for 
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the  tine  necessary  tc  alter  the  priority  merely  gets  the 
time,  alters  the  priority  a  selected  numijer  of  tiires,  then 
gets  the  time  again.  Ihere  are  two  programs  necessary  to 
determine  the  time  tc  process  switch.  Tney  are  identical  to 
the  twc  frcgrams  mentioned  atove  except  that  the  number  of 
messages  between  process  switching  is  set  to  one.  Utilizing 
the  ahcve  programs  ve  are  able  to  obtain  tne  inter-prccess 
nessage-f assing  times  on  bcth  the  PDP-11/44  and  the 
7AX-11/760.  The    next      chapter    will      discuss   the      selected 

datalase,  request  sets,  and  procedures  taken  to  run  the 
actual    benchmark  of    KIBS. 
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V.  IHE  BSNCHJAEK  OF  MDBS 

Ihe  ccnstruction  cf  the  test  dataiiase  and  the  selection 
of  reguests  are  ver;y  important  in  the  performance  measure— 
lent  cf  a  test  database  system.  The  test  database  should  be 
representative  of  a  real  database,  but,  as  presented  in 
[Ref.  7]^  the  test  database  should  be  modeled  independent  of 
any  specific  database.  Both  the  Test  database  and  the 
reguests  selected  shculd  be  properly  modeled  to  allcw  for  a 
complete  exercise  of  the  target  system.  At  the  same  time, 
parameters  must  not  be  selected  randomly,  but  rather  should 
be  created  to  provide  the  evaluator  flexability  and  ease  of 
evaluation.  In  this  chapter,  we  first  describe  the  manner 
in  which  the  test  database  is  modeled.  We  then  describe  the 
request  set  which  is  used  in  the  performance  measurement 
experiments. 

A.   TEE  £E1ECTED  DATABASE 

Since  MEBS  is  an  experimental  database  system,  it  is 
constantly  being  improved  and  enhanced.  For  this  reascn, 
the  test  database  is  designed  to  facilitate  measurements  by 
being  easily  expandable.  A. distinct  ion  will  be  made  in  the 
following  discussions  between  the  design  of  the  test  data- 
base, which  allows  for  future  measurements,  and  the  actual 
implenentation  of  the  test  database  used  in  the  measurement 
experiments. 

'^  •      liJ  De sign  of  the  Model  Database 

Several  factcis  must  be  considered  in  the  design  of 
a  model  database.  Since  the  system  being  measured  can  be 
configured   with  either   one  or   two   bacXends,   the   'work' 
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required  to  process  a  request  has  to  ije  evenly  divisitle  to 
accomflcdate  the  use  ct  either  one  or  two  tackends.  Ihe 
types  cl  work  involved  are  attribute  search,  descriptor 
search/  cluster  search,  address  generation  and  the  retrieval 
and  selection  of  reccids. 

laile  I  displays  the  three  configurations  to  he  used 
in  the  performance  measurement  of  MDBS.  The  coniiguraticns 
have  teen  selected  tc  simplify  the  verification  of  the  ML3S 
performance  and  capacity  claims.  These  claims  are  to  1) 
halve  the  response  time  by  doubling  the  numter  of  tackerds 
and  keeping  the  size  cf  the  database  constant  and  2)  main- 
tain a  ccnstant  response  time  ty  doubling  both  the  numher  of 
backends  and  the  size  cf  the  database.  As  shown  in  Table  I, 
going  ficm  Test  A  tc  Test  E  maintains  a  constant  database 
size  tut  allows  the  database  to  be  evenly  split  tetween  two 
tackerds.  Conversely,  going  from  Test  B  to  Test  C  doubles 
the  size  of  the  datatase  at  each  backend. 


TABLE  I 
The  Eenchmark  Configuration 

Test 

Nc.  of  Backends    Mbyte/backend 

A 

1                n 

B 

2              0.5  n 

C 

2                                        n 

1 

To  properly  evaluate  a  database  system,  various 
record  sizes  need  to  te  used.  The  sizes  are  chosen  tased  on 
the  size  of  the  unit  cf  disk  management.    In  MDBS,   this  is 
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the  logical  track,  or  block.  MDBS  processes  infcrmation 
from  the  secondary  memory  using  a  4KJjyte  logical  track. 
Given  a  tlocksize  of  UKbytes,  we  recommend  ccnstructicg  the 
database  with  record  sizes  of  200  bytes,  400  bytes,  1C00 
bytes,  and  2000  bytes  £Bef.  7].  This  gives  a  range  of  2  to 
20  records  per  block.  This  also  creates  an  envircrment 
where  four  separate  databases,  corresponding  to  the  four 
record  sizes,  must  be  generated  and  tested  for  each  configu— 
raticr  given  in  Table  I.  Table  II  gives  the  corr espcr.dang 
relationship  between  records  and  blocks. 


TABU    II 
The  Reccxd'-aiid— Block   Relationship 

1 

1 

lecord 

Size 

in   Bytes 

200 

Eecords 

per 
Block 

20 

UOO 

10 

1000 

i 

4 

2000         1 

2 

As  described  in  Chapter  III,  the  target  system 
stores  records  in  clusters.  Five  cluster  categories  have 
teen  selected  for  use  in  the  creation  of  the  model  database. 
The  distinguishing  characteristic  of  a  cluster  category  is 
the  number  of  blocks  used  to  store  the  records  Ie  the 
particular  category.  Table  III  outlines  the  sizes  of  each 
of  the  five  cluster  categories.  One  final  note,  the  number 
of  blocks  per  cluster  must  be  even.    Thus,   when  the  number 
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cf  tackecds  is  ircreased  from  cne  to  two  with  the  number  of 
records  in  the  database  remaininij  constant,  we  are  guaran- 
teed that  each  tackend  will  have  the  same  number  of  blocks 
per  cluster.  For  example,  when  the  cluster  categciy  is 
small,  each  backend  would  have  one  blocK  for  the  particular 
cluster,  insuring  an  even  distribution  of  the  database. 


TABLE  III 
The  Cluster  Arrangement 


Clusters  Categoryl Blocks/Cluster 


small 

small— medium  | 

medium  | 

medium— large  j 

larce  I 


10 


Combining  the  data  in  Tables  II  and  III,  we  can 
construct  a  matrix  cf  data  which  represents  the  number  of 
rec-ords  per  cluster  category.  Table  IV,  indexed  using  the 
cluster  category  and  the  record  size,  details  this  infcrma- 
tion .  The  number  cl  records  per  cluster  is  obtained  by 
multiplying  the  Records/Block  results  from  Table  II  by  the 
corresponding  Blocks/Cluster  results  from  Table  III. 

The  remaining  considerations  when  developing  a  test 
database  involve  the  specification  of  the  directory  struc- 
ture for  the  particular  record  type.  In  MDBS,  a  reccrd 
template,  which  describes  the  record  structure  is  defined. 
The  record  template  defines  the   number  of  attributes  in  the 
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TABLE  IV 
The  Heccrds  per  Cluster  Category 


Numter   ox    Records   per  Cluster 

(Record 
Size 
in    Eytes) 

(200) 

(400) 

(1000) 

(2000) 

C   C 

I    A 
U    I 
S    E 
I    G 
E    0 
H    E 
Y 

L         _     _ 

small 

j 

~~40 "~ T~~~20        T           8 

1 

4 

saall— mediuii  1 

80 

40 

16 

a 

lediua 

120 

60 

24 

12 

me  dium— large 

160 

80 

32 

16 

large           j 

200          ] 

100 

40 

20 

I 


record,  and  for  each  attribute,  the  attritute  naire  arc  the 
attritute  t^'pe  (either  integer  or  string).  Given  a  record 
template,  the  directory  and  non— directory  attributes  are 
specified.  For  each  directory  attribute,  a  descriptor  tjpe 
and    descriptor    ranges   are  defined    (see   Chapter   III)., 

2 «      Ihe    Implementation    of    th e   Model    Database 

Ihis  section  examines  the  implementation  decisions 
made  when  specifying  the  test  database  and  the  testing 
strategy.  Ihe  current  version  of  HDBS,  the  primary-memory— 
tased  directory  management,  stores  the  directory  tables, 
i.e.,  the  AI,  DDIT,  and  CLT ,  in  primary  memory.  Given  the 
primary  aemcry  limitations  of  the  backend,  we  are  forced  to 
limit  the  variables  nentioned  in  the  previous  section.  Cur 
first  decision  is  to  limit  the  size  of  the  test  database  to 
a  maxiiruff  cf  1000  records  per  tacJcend.  Table  V  displays  the 
configurations  which  are  used  in  the  performance  measure- 
ments   of    MDES. 
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TABU  V 
The  Measurement  Configurations 


lEST 

No. 

1 

cf  Backends 

Records/ Backend 

A.E 
E.E 
C.E 
A.I 
E.I 

1 
2 
2 
1 

2 

1000 

500 
1000 
1000 

500 
.  „  -  ...  ._     _  .   ,    J 

Eiv€  different  system  configurations  are  needed  for 
the  MEBS  performance  neasurements.  Tests  A.E,  B.Z,  and  C.E 
are  conducted  withcut  internal  performance  software  in 
place.  lest  A.E  configures  KDBS  with  one  tackend  and  one 
thousand  records  in  the  test  database.  Test  B. E  configures 
MDBS  vith  two  backends  and  one  thousand  records  split  everly 
between  the  tackends.  Test  C.E  also  configures  !1DES  with 
two  tackends,  but,  the  size  of  the  database  is  dcutled  to 
two  thousand  records.  Test  A.I  and  B.I  are  conducted  with 
internal  performance  software  ir.  place.  Test  A.I  configures 
MDBS  liith  one  backend  and  one  thousand  records  in  the  test 
database.  Test  B.I  configures  MDBS  with  two  backends  and 
one  thousand  records  split  evenly  between  the  backends. 
Dsing  these  five  corfigura tiocs,  the  verification  of  the 
MDBS  performance  and  capacity  claims  is  simplified  and  the 
perfcrmance  measurenent  methodology  of  computing  the 
internal  measurement  overhead  is  facilitated. 

Cur  second  decision  fixes  the  recora  size  at  200 
bytes.  The  200  byte  record  ninimizes  the  primary  memory 
required  to  store  tie  record  template.  In  actuality,  a 
record  of  198  bytes  is  used.  The  record  consists  cf  33 
attritutes,   each  reguiring  6  tytes  of  storage.    The  record 
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template  file  used  in  our  exiieriments  is  shown  in  Figure 
6.1.  Cf  the  33  attributes  listed,  INIE1  and  INIE2  are 
directory-  attributes.  MULTI  and  STROO  to  STR29  are  ncn- 
directcrj   attributes. 

In  cur  next  decision,  the  descriptor  types  and  the 
descriptor  ranges  fcr  the  two  directory  attributes,  INIEI 
and  INTE2,  are  defined  in  the  descriptor  files  (see  Figure 
6.2) .  lie  values  fcr  INTE1  are  classified  by  using  five 
type-A  descriptors,  each  of  which  represents  a  range  cf  200, 
Ihe  values  for  1^112  are  also  classified  using  type— A 
descriptors.  The  first  twenty— three  ranges  for  INTE2  cover 
40  values,  with  the  last  range  covering  80  values.  Ihe 
non-unif crmity  of  the  IUTE2  descriptor  ranges  is  caused  by  a 
size    constraint    in    tie   Concurrency    Control    process. 


Attribute   Name 


IN1E1 
INIF2 
MDIII 
ST500 
STE01 


STIi29 


Attribute    Type 

integer 

integer 

string 

string 

string 


string 


Figure  6.  1   The  Becord  Template  File. 

Ey  utilizing  the  attribute  and  descriptor  files,  the 
record  file  is  generated-  INTE1  and  INTE2  are  identical, 
being  the  next  sequential  number  after  the  previous  record, 
starting  at  1,  Therefore,  the  one  thousandth  record  would 
have  the  (INTEl,  IKTE2)  pair  set  to  1000.  The  MUITI 
attribute,  which  is  cf  type  character  string,  is  set  tc  One 
for  a  datatase  of  oily  1000  records.  The  intent  of  this 
attribute 
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Attxitute  Name 

Descriptor 

Type   i 

Descriptor 

Range 

INIEl 

A 

1 

-> 

200 

20  1 

-> 

40C 

401 

-> 

60C 

601 

-> 

80C 

801 

,  

-> 

1000 

IN'IE2 

A 

1 

-> 

40 

41 

-> 

80 

81 

-> 

120 

121 

-> 

16C 

161 

-> 

200 

201 

-> 

24C 

24  1 

-> 

280 

281 

-> 

320 

321 

-> 

360 

361 

to 

400 

401 

-> 

440 

44  1 

-> 

48C 

481 

-> 

520 

521 

-> 

56C 

56  1 

-> 

600 

601 

-> 

64C 

64  1 

-> 

680 

681 

-> 

720 

721 

-> 

760 

761 

-> 

800 

801 

-> 

840 

341 

-> 

880 

881 

-> 

920 

921 

-> 

10CC 

Figure  6.2   The  Descriptor  File. 


IKTI1 
- 

2 


1000 


INTE2 

1 

2 


1000 


CULT  I 


Cne 
Cne 


Cne 


SIEOO 


Xxxxx 
Xxxxx 


Xxxxx 


STR01 


Xxxxx 
Xxxxx 


Xxxxx 


SIR29 


Xxxxx 
Xxxxx 


Xxxxx 


Figure  6-3    The  Record  File. 
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is  to  increase  the  lumber  cf  records  per  cluster  in  the 
database.  This  is  dene  by  setting  ilULTI  to  Two,  Three, 
etc.,  for  each  (IKTE1,  INIE2)  pair  in  the  datahase. 
Therefore,  to  double  the  size  cf  the  database,  every  (INTE1, 
INTE2)  fair  will  have  an  associated  MULTI  attribute  with 
values  of  Cne  and  Two.  The  remaining  attributes,  2TR0O  to 
STR29,  are  set  to  Xxxjjx  as  fillers.  Figure  6.3  depicts  the 
general  laycut  of  tie  record  file  for  1000  records  where 
CULII  is  set  to  Cne. 

Given  the  structure  described,  our  last  decision  is 
made  for  us.  The  Sfecif ication  of  24  descriptors  for  the 
INTE2  attribute,  coupled  with  the  record  file  structure, 
generates  a  database  that  contains  24  clusters.  Tte  first 
23  clusters  correspond  to  the  small  cluster  category,  and 
each  contains  40  records.  The  last  cluster  corresponds  to 
the  snall-iredium  cluster  category  and  contains  80  records. 
To  maintain  consistency  in  the  retrieval  re-^uests  (discussed 
in  the  next  section)  ,  we  avoid  any  requests  that  access  the 
last  80  records  in  the  test  database  using  the  INTE2 
attribute. 

B.  Ill    EEQDESI  SET 

The  recuest  set  used  for  our  performance  measureinert  is 
given  in  figure  6.4.  The  retrievals  are  a  mix  of  single  or 
double  predicate  requests.  Since  the  majority  of  the  wcrk 
done  en  a  database  is  to  retrieve  data,  we  limit  the  meas- 
urements to  only  retrieve  requests.  In  every  request,  1/2 
of  the  target  attribute  values  for  each  record  is  returned. 
The  first  retrieve  is  a  request  for  only  two  records  from 
two  separate  clusters.  The  second  request  retrieves  1/4  of 
the  database.  Seven  of  the  24  clusters  must  be  examined. 
All  records  in  each  cf  the  first  six  clusters  are  retrieved. 
Cnly  1/4  of  the  seventh  cluster,    defined  by  the  range  from 
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Bequest  Number 

Eetrieval  Bequest 

1 

(INTE1=10)  or  (INTE1=230) 

2 

3       ^ 

(INTE2  <  250)           i 

(INTE2  <  500) 

4 

(INTE1  <  1000)          1 

5 

(INIE15200)  or  (INTE1>801) 

6 

(INTE1<400)  or  (INTE1>601)    i 

7        ^ 

(INTE1  <  201) 

8 

(INTE1  <  401)           i 

5 

(INTE1<201)  or  (INTE1>800) 

Ihe  Target  Ittribu te-Values  for  Each:        i 

(INTI1,INTE2,MUL1I,STE0  0,SIE01,STE0  2,STE0  3,STt04, 
SIEC5,£!IE06,S'IEC7,STE0  8,  SIE09,  STE  10  ,  SIE  1  1,  SIR  12)  i 

Figure  6-4    The  Eetrieval  Bequests. 

241  tc  280^  is  retrieved.  In  the  third  request,  1/2  cf  the 
datatase  is  retrieved.  Thirteen  of  the  24  clusters  must  be 
examined.  All  records  in  each  ox  the  first  twelve  clusters 
are  returned.  Cnly  1/2  of  the  thirteenth  cluster,  defined 
ty  the  rarge  from  481  to  520,  is  retrieved.  The  system 
searches  only  for  records  having  values  in  the  range  from 
481  tc  500  in  this  cluster. 

The  entire  database  is  examined  in  the  fourth  request. 
The  fifth  request  retrieves  2/5  of  the  database.  The  query 
is  divided  into  two  |:redicates,  to  obtain  all  records  from 
the  first  five  clusters,  and  the  last  four  clusters.  Ihe 
sixth  request  is  a  retrieval  of  4/5  of  the  database.  Again 
the  querj  is  divided  into  two  predicates,  to  obtain  all 
records  frcm  the  first  10  clusters,  and  the  last  nine 
clusters . 
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Ih€  seventh  and  eighth  recuests  are  similar  in  intent. 
Ihe  seventh  request  examines  10  clusters,  requiring  cnly  1 
record  tc  he  retrieved  from  the  6th  cluster  and  needing  all 
records  frcm  the  first  five  clusters.  The  eighth  request 
examines  15  clusters,  requiring  only  1  record  tc  be 
retrieved  from  the  11th  cluster  and  needing  all  records  from 
the  first  ten  clusters.  The  ninth  and  final  recuest  is 
similar  tc  the  fifth  request.  But  unlike  the  fifth  request, 
ten  additictal  clusters  must  he  examined.  Only  two  of  the 
records  *ith  INTE1  values  of  201  and  801,  are  retrieved  from 
the  ten  additional  clusters.  All  records  in  the  remaining 
nine  clusters,  like  the  fifth  request,  are  also  obtained  by 
this  retrieval.  Table  VI,  a  presentation  of  the  number  of 
clusters  examined  versus  the  percent  of  the  database 
retrieved,  is  a  synopsis  of  the  previous  discussicr  in 
tabular  form. 

The  request  set  in  Figure  6.4  is  not  intended  tc  be 
representative  of  a  ccmprehensive  and  complete  request  set. 
The  gcal  is  net  to  exhaustively  measure  and  evaluate  MDES. 
Eather,  we  focus  on  applying  the  performance  measurement 
methcdclcgy  to  MDBS  to  validate  the  basic  performance  and 
capacity  claims  cf  the  system.  We  feel  that  these  requests 
are  sufficient  for  such  a  validation.  We  will  refer  to 
these  rine  requests,  i.e.,  retrievals,  by  their  reccrd 
number  in  subsequent  discussion. 
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TABIZ  VI 

Th€  NuEter  of  Clusters  Examined 
and  the  Percent  of  the  Database  Retrieved 


Ee-^uest 
Numher 


Number  of 
Clusters 
Exami  ned 


f 


13 


24  (all) 


-' 


19 





10 


15 


19 


Volume  of 

Database 

Retrieved 


2  records 


2b': 


50% 


100% 


40% 


30% 


20%  +  1  record 


40%  +  1  record 


40%  +  2  records 
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VI .  THE  TEST  RESOLTS 

iL  this  chapter,  we  present  the  results  obtaiLed  from 
the  perf  or  nance  measiixement  ex  MDBS.  MDBS  is  currently 
configured  with  the  primary-memory— based  directory  marace— 
ment.  In  this  versicr  of  MDBS,  the  directory  tables,  i.e., 
the  AT,  DEIT,  and  CDT,  are  stored  in  the  primary  memor;y.  We 
expect  tc  achieve  different  results  when  version  F,  the 
secondary— iiiemory-bas€c  directory  management  is  implemented. 
The  test  interface  is  utilized  to  send  the  retrieval 
requests  discussed  in  the  previous  chapter  to  MEES  for 
processing.  Each  reguest  is  sent  a  total  of  ten  tines  jer 
database  configuraticr.  The  response  time  of  each  recuest  is 
recorded.  After  some  trial  runs,  we  compute  the  stardard 
deviation,  ^"e  determine  that  ten  repetitions  of  each  reguest 
is  sufficient  to  provide  the  desired  accuracy. 

The  internal  processing  times  of  the  message— hardling 
routines  which  are  used  to  process  a  retrieval  reguest  are 
also  timed-  Retrieval  (1)  and  Retrieval  (2)  are  selected  to 
conduct  internal  timing.  These  requests  are  selected  since 
they  retrieve  the  smallest  portion  of  the  test  database  and 
the  processing  time  fcr  each  request  is  minimal.  Recall  that 
each  message— handling  routine  is  timed  independently  cf  all 
others  ard  that  each  routine  must  process  multiple  requests 
so  that  an  accurate  average  may  be  computed  for  the  time 
required  tc  process  that  request  type.  Sixteen  message- 
handling  routines  are  required  to  process  a  retrieve 
request.  If  we  send  twenty  requests  to  each  routine,  a  total 
cf  32C  requests  must  be  processed  by  MDBS.  Based  en  these 
figures,  the  time  required  tc  conduct  the  internal  perform- 
ance leasurement  of  a  retrieval  that  nas  a  response  time  cf 
twenty   seconds  will   be   approximately   107  minutes.    This 
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figure  dees  not  include  the  administrative  time  required  to 
process  the  internal  Eeasurement  data.  For  this  reason,  we 
limited  the  internal  ferformance  measurement  requests  to 
Eetrievals  (1)  and  (2). 

Additionally,  we  also  limited  the  number  of  repetitions 
per  message  handler  to  twenty.  This  is  done  to  reduce  the 
processirg  time  per  nessage  handler.  However,  this  decision 
reduces  the  accuracy  of  the  internal  performance  measure- 
ment, from  ten— thousands  to  hundredths  of  a  second.  Thus, 
the  internal  performance  measurement  times  provide  only  a 
rough  estimate  of  the  time  required  to  handle  the  respective 
nessa  ces . 

lie  first  section  of  this  chapter  contains  the  external 
timing  results  obtained  from  our  measurements.  Re  also 
discuss  the  performance  and  capacity  improvements  obtained 
by  adding  tackends.  In  the  second  section  we  present  the 
results  frcm  internal  performance  measurement.  The  final 
section  examines  the  inter-process  and  inter-computer 
nessace  processing  times.  One  final  note,  the  units  of 
measurement  presented  in  the  tables  of  this  chapter  are 
expressed  in  seconds. 

A.   TEE  IXTEENAL  PEBICBMANCE  EESDLTS 

Table  VII  provides  the  results  of  the  external  perform- 
ance ireasurement  of  ILDBS  without  the  internal  performance 
measurement  software.  There  are  three  parts  to  Table  VII. 
Each  part  contains  the  mean  and  the  standard  deviation  of 
the  response  times  for  Retrievals  (1)  through  (9),  which  are 
outlined  in  Chapter  V.  The  three  parts  of  Table  VII  repre- 
sent three  different  configurations  of  the  MDBS  hardware  and 
the  catatase  record  capacity.  The  first  part  has  MIES 
configured  hith  one  kackend  and  the  database  loaded  with 
1000  records.   The  second  part   has  MDBS  configured  with  two 
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TABLE  711 

Ihe  Response  Time 
Without  Internal  Performance  Evaluation  Software 


Eeguest 
Numier 

1 

One    Backend 
IK  Records 

Two    Backends    | 
IK    Records       1 

1 

Two    Backends    1 

2K    Records      j 

1 

mean    , 

stdev 

mean    , 

stdev 

mean 

stdev 
0.0282 

1 

3.208 

C.C189] 

2.051 

0.0324 

3.352, 

2         1 

13.69  1 

0.02551 

7.511 

0.0339, 

14.243 

0. 0185 

3         1 

26.492 

0.0244, 

14.  164 

0.0269 

26.737 

C.0405 

^ 

52. 005 

0.C539, 

j 

26.586, 

0.0294 

52. 173 

0. 0238 
"c70  23  7 

5        1 

21  .449 

C.0336 

11  .309 

0.0375 

21.550 

6         1 

4  2.23  5 

C.G326 

1 

21.622, 

0.0424 

42.287 

C.0400 

7 

12. 285 

C.0408, 

1 

6.642, 

0.0289, 

12.347 

0.C371 
0.0110 
0.0181 

8 

22,  532 

0.0296J 

" 

1  1.764 

0.0300 

22.583 
24. 169 

9         j 

23.  913 

C. 11151 

12.624] 

0,0350 

tacXends,  with  the  database  containing  1000  records,  split 
evenlj  t€tween  the  backends.  The  third  part  has  KDBS  config- 
ured fcith  two  backends,  with  the  database  doubled  to  2000 
records,  also  split  evenly  between  the  backends.  In  latle 
VII  w€  notice  one  data  anomaly,  the  standard  deviation  for 
request  (9)  in  the  one— backend— with— 1000— records  configura- 
tion. Since  we  did  not  conduct  an  internal  performance 
measurenient  on  this  request,  we  are  not  sure  what  causes 
this  skewed  standard  deviation,  and  hence  will  not  attempt 
to  offer  an  explanation  of  this  anomaly. 

Given  the  data  presented  in  Table  7TI ,  we  can  now 
attempt  to  verify  or  disprove  the  two  MDBS  perfcrinance 
claims.  We  begin  by  calculating  the  response— time  improve- 
ment cf  MDBS.  The  response- time  im£rovement  is  defined  to  be 
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Figure  7.  1   The  Eesponse-Tiue— Improvement  Calculation. 

the  percentage  impicvement  in  the  response  time  of  a 
request,  when  the  request  is  executed  in  n  backends  as 
opposed  to  one  tack€nd  and  the  number  of  records  in  the 
database  remains  the  same.  Figure  7,1  provides  the  formula 
used  to  calculate  the  response— time  improvement  for  a 
particular  request,  where  Configuration  3  represents  n  tack- 
ends  and  Configuraticn  A  represents  one  backend.  Thus,  in 
Table  VIII  we  present  the  response— time  improvement  for  the 
data  given  in  Table  VII.  Notice  that  the  response-time 
improvement  is  lowest  for  request  (1),  which  represents  a 
retrieval  of  two  records  of  the  database.  On  the  other  hand, 
the  respcnse-time  improvement  of  request  (^) /  which 
retrieves  all  of  the  database  information  is  highest, 
approaching  the  upper  bound  of  fifty  percent.  In  general,  we 
find  that  the  respcnse-time  improvement  increases  as  the 
number  of  records  retrieved  increases.  This  seems  to  support 
a  hypothesis  that  even  if  the  database  grows,  the  respcnse- 
time  improvement  will  remain  at  a  relatively  high  (between 
40  and  50  percent)  level. 

Next  we  calculate  the  response— time  reduction  of  MDBS. 
The  res  {:cnse— time  reduction  is  defined  to  be  the  the  reduc- 
tion in   response  tine   of  a  request,    when  the   request  is 
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TAEIE  VIII 

The  Response— Time  Improvement  Between 
and  2  Backends  (External  Measurement  Onl^) 


Eequest 

Res 

^onse  Time 

Numter 

Im 

provement 

1 

36.07 

^ 

45.14 

3 

46.53 

a 

48.94 

c 

47.27 

6 

48.81 

7 

45,93 

8 

47.79 

9 

47.21 

IK  Records 

No  Internal— 

teasurement  Software] 

I 


executed  in  n  tackends  containing  nx  numter  of  records  as 
opposed  to  cne  backecd  with  x  cumi)er  of  records.  figure  7,2 
provides  the  formula  used  to  calculate  the  the  respcnse-time 
reducticn  for  a  particular  retrieval  re^^uest,  where  ccnfigu— 
ration  A  represents  ere  hackend  with  x  records  and  configu- 
raticr  B  represents  n  Lack  ends,  each  with  x  records.  In 
Table  IX  we  present  the  response— time  reductions  for  the 
data  given  in  Table  VII.  Notice  that  the  response-time 
reduction  is  worst  for  reguest  (1)/  which  represents  a 
retrieval  of  two  records  of  the  database.  On  the  other  hand^ 
the  response— time  reductions  for  the  retrievals  which  access 
larger  portions  of  the  database,  requests  (4)  and  (6),  have 
only  a  snail  response-time  reduction.  In  general,  we  found 
that  the  response— time  reduction  decreases  as  the  number  of 
records  retrieved  increases,  i.e.,  the  response  time  remains 
virtually  constant.  Again  we  seem  to  have  evidence  to 
support   the   hypothesis  that,   as   the  size  of  the  database 
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increases,      the      resf cnse-t ime   reduction    will   decrease      tc   a 
relatively    low     (    0.1^   or   less    )    level. 


Tie 
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figure   7.2        The    Besponse-Time— Reduction  Calculation. 


TABLE  IX 

The  Besponse— Time  Reduction 
In  Doutling  the  Database  Size 


Request 

R€ 

jsponse    Time 

Numtier 

Reduction 

1 

a. 49 

2 

4.03 

3 

0.92 

4 

0.32 

c 

0.47 

6 

0.12 

7 

0.50 

8 

0.23 

9 

1.07 

IK  Records  on  each 
B  ackend 
No  Internal- 
Measurement  Software! 


latle  X  provides  the  results  of  external  perf or icance 
Eeasurement  of  MDBS  with  internai  performance  measurement 
software   ir   place.  There    are   two   parts    to      Table    X.      Each 

part  contains  the  mean  and  the  standard  deviation  of  the 
response-tiEes  for  the  requests  (1)  through  (6),  which  are 
outlined  in  Chapter  V.  The  two  parts  of  Table  X  represent 
two  different  configurations  of  the  MDBS  hardware  and  the 
database  record  capacity.  Eart  one  has  MDBS  configured  with 
one  lac.kerd  and  the  database  leaded  with  1000  records.  Part 
two  has  HLBS  configured  with  two  tackends,  with  the  database 
containing  1000  records,  split  evenly  between  the  backends. 
Re  did  Ect  measure  the  response  tiines  with  two  thousand 
records  distributed  ever  two  tackends.  We  felt  that  no  addi- 
tional information  would  be  gained  by  conducting  the  meas- 
urements. 


TABLE    X 

The    Response    Time      in   seconds) 
With   Internal   ferforaance   Measurement   Software 

Request 
Number 

Cne   Backend 
IK   Records 

Two    Backends    | 

IK   Records      1 

1 

mean 

stdev 

mean    , 

stdev 

1 

6 

3.205, 
13.418 
25.903 
5C.750 
2C.972 
41.262 

0.0436 
0.0172 
0.0119 
0.0374 
0.0271 
0.0331 

2.219 
7.401 
13.854 
26.402 
1  1.244 
21.517 

0.04  7  4 
0.0277 
0.0361 
G, 0596 
0. 0528 

0.0575 

1 

An  interesting  arcmaly  is  discovered  when  we  compare  the 
response  times  of  the  external  and  internal  performance 
measurement   tests,    i.e.,    parts    one   and    two    of    Tables    VII    and 
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X  for  requests  (1)  through  (6)  .  We  actually  found  a  general 
improvement,  from  O.n  to  5%,  in  the  response  times  ci  the 
requests  when  the  internal  performance  measurement  software 
is  part  cf  the  MDBS  code.  One  hypothesis  is  that  this  is 
due  to  the  manner  in  which  MDBS  is  implemented  on  the  tack- 
ends.  Currently,  there  is  not  sufficient  physical  memory 
availahle  en  each  backend.  The  result  is  that  disk  overlays 
are  used  to  swap  in  the  code  necessary  to  run  MDBS.  Ihe 
additional  internal  performance  measurement  code  may  cause 
the  operating  systen  to  overlay  differently,  thereby 
benefiting  the  overall  performance  of  MDBS.  We  still 
believe  that  there  is  an  overhead  induced  by  the  internal 
measurement  code  and  Table  XI  provides  evidence  by  demon- 
strating that  the  response-time  improvement  achieved  by 
adding  a  backend  is  net  as  good  as  that  of  Table  VIII. 


TABU    XI 

Ihe    Response   Time    Iniproveaent   Between 
1   and   2  Backends    (With    Internal    Measurement    Also) 


Request 
Nunber 

1 
2 

3 

4 

c 

6 

Response  Time 
Improvement 

30.76 
44.84 
46.52 
47.98 
46.39 
47.85 

IK  Records      i 
Evaluation      J 
S  of tware 
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E,  III    INIZENAl  PEEICEHANCl  EESOLTS 

Table  XII  provides  the  results  of  the  iLternal  ferfoim- 
ance  ceasurement  of  KIBS  f o r  a  retrieval  request.  The  tiaes 
measured  for  each  uessage-haiidling  routine  are  given  for 
toth  request  (1)  and  (2).  The  message-handling  routines  are 
listed  viith  the  MDBS  process  which  contains  the  routine. 
Although  the  results  are  given  to  four  decimal  places,  we 
only  trust  the  accuracy  to  the  second  decimal  place.  The 
reascD  for  this  has  teen  discussed  in  the  introduction  to 
this  chapter.  We  are  not  experts  on  the  MDBS  system.  We  can, 
however,  make  a  few  comments  on  Table  XII  and  we  are  sure 
that  these  who  are  experts  can  use  the  results  contained  in 
Table  XII  tc  draw  mere  in— depth  conclusions  on  the  system. 
We  see  that  the  controller  processes,  i.e.,  Request 
Preparation  and  Post  Processing,  spend  very  little  tiie  in 
processing  the  retrieval  request.  This  is  a  major  design 
goal  cf  MDBS  and  is  necessary  to  prevent  a  bottleneck  at  the 
controller  when  the  number  of  backends  increases  substan- 
tially. It  appears  that  this  goal  is  met  successfully.  We 
also  observe  that  the  results  obtained  from  Concurrency 
Control  are  consistert  and  of  short  duration.  This  is 
expected  since  there  is  only  ore  request  in  the  system  at  a 
time  and  no  access  coEtenticn  can  occur.  These  tables  should 
then  be  considered  as  containing  the  nest-case  times.  The 
majority  of  work  done  in  the  backend  is  at  Record 
Processing.  Observing  the  process  timings  in  Record 
Processing,  we  see  that,  for  both  requests,  the  addition  cf 
an  extra  tackend  reduces  the  record  processing  time  by 
nearly  half. 

C.   TEE  MESSAGE  PBOCESSING  BESDLTS 

Table  XIII  provides  some  average  times  relating  to 
inter-process   message   passing   times  on  the  controller  and 
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TABLE  XII 

Message  Hacdling  Routine 
Processing  limes  for  a  Retrieval  Reguest 


MIBS 

iLCcess 

Message 

Handling 

Routine 

Reguest 
Numter 

1 

One 
Backend 
IK    Recordsi 

Two 
Backends 
IK    Records 

Reguest 
Preparation 

Record   Count 

lo    Pest   Proc 

I    , 

0.0005 
0.0000 

0.C015 
0.0000 

Parse 
Traffic    Unit 

I 

0-0200 
0.0180 

0. 0190 
0.  C185 

Broadcast 
Recuest 

I    ' 

0.0025 
0.0065 

0.0025 

0.003C 

Ecst 
Processing 

Collect 
Results 

i    , 

0.0465 
0.0890 

0.025C 
0.0813 

Eirectcry 

Kanageient 

Parsed 
Traffic    Unit, 

I 

0.0699 
0.0925 

0.0450 
0.0491 

Did   Sets 
Locked 

I 

0.0516 
0.0566 

0.0566 

0.0566 

Cid   Sets 

Locked 

I    ' 

0.0533 
0.0450 

0.0349 
0.0433 

Descriptor. 
Ids 

1 

2 

na 
na 

0.0391 
0.0558 

CcLCurrency 
Ccntrcl 

Cids   for 
Traffic    Unit 

1 

2         , 

0.0424 
0.0425 

0.0433 

0.0433 

o7o4oi 

0.0516 

Did    Sets 
Traffic   Unit 

I 

0.0566       1 
0,0508 

Did  Sets 
Released 

\ 

0.0025       , 
0.0008 

0.0016 
0.0006 

Record 
Processing 

Entire 
Prccess 

2 

2.6462 
12.7100 

1.3775  • 
6.5716 

fieguest    with 
Bisk    Address 

1 

2 

0-046b 
0.0433 

0.0433 
0. 0383 

Cld 
Reguest 

1  0.0130 

2  0.0131 

0. 0148 
C. G16S 

PIC 

Read 

1 
2 

0.0844 
0.8593 

0.0865 
0.8863 

Eisk 
Input/Outpu  t 

1 
2 

0.0799 
0.0783       , 

0.0741 
0.0725 

8S 


TABLE    XIII 
Inter— process  Message   Passing   Times 


location 

Time    to 

Constr  uct 

Message 

Cent  roller 
Eackend 

0.00249 
0.00830 

Time    to 
Receive 
Message 

Time    to 

Pass 
Message 

0.00267 
0.00410 

0.00520 
0.01250 

t — 


the  hackend.  Messages  are  traDsmitted  iietween  two  processes 
en  tcth  the  controller  and  tackend.  Both  the  cuniter  of 
messages  and  the  aessage  length  are  varied.  On  the 
contrcller^  the  numter  of  messages  is  varied  from  1  tc  100 
while  th€  nessage  lergth  is  varied  from  2  to  2000  tytes 
(si2€  of  the  message  buffers  in  MDBS  controller) .  Or  the 
tackend,  the  numter  of  messages  is  varied  from  1  to  50  while 
the  message  length  is  varied  from  1  to  1000  bytes  (size  of 
the  message  buffer  in  MDBS  tackends) .  It  takes  the  tackend 
twice  as  long  to  process  a  message  as  it  does  the 
contrcller.  We  telieve  the  reason  to  be  hardware  processor 
speed.  Ae  independent  test  showed  that  this  relationship, 
of  twc  to  cne,  holds  in  how  long  it  takes  to  process  an 
assignment  statement  en  the  respective  nardware. 
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lE  Table  XIV  we  provide  iLformatioc  concerning  the  time 
to  process  inter-comj uter  messages  on  the  PCL.  Messages  of 
length  less  than  forty  are  overshadowed  b_y  the  overhead  of 
the  rci.  There  exists  a  linear  relationship  between  the 
message  length  and  the  time  tc  pass  a  message  as  the  message 
length  exceeds  100  bytes.  We  can  therefore  expect  a  linear 
performance  from  the  PCL  for  the  majority  of  the  MDBS  inter- 
computer messages.  The  next  •  chapter  will  contain  seme 
concluding  lemarks  and  discuss  areas  for  further  research. 


TAELE    XIV 

Inter-coaf uter    Hessage   Passing    Tines 

1    ... 

Message 
Length 

Time    to    J 

Pass 

Change 

(Bytes) 

Message 

1C 

0.0949    ^ 

0.0000 

20 

0.0951 

0.0002 

30 

0.0954 

0.0003 

i*C 

0.0957    i 

0.0003 

5C 

0.1005 

0.0008 

6C 

0.  1011 

0.0006 

70 

0  .  1018 

0.0007 

8C 

0.  1023 

0.0005    , 

9C 

0.  1029 

0.0006 

ICO 

0.1036 

0.0007 

200 

0.  1136 

0.0100 

300 

0.1238    ] 

0.0102 

400 

0.  1339 

0.0101 

5C0 

0  .  1439 

0.0100 

1000 

0.  1943 

0. 0504 
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VII.  THE  CONCLOSION 

A.   A  SUKMAEY  OF  THE  fEBPORMAMCE  MEASUREMENT  METHODOIOGI 

'^  •      Ih^   lilternal  lerxormance  Measurement  Methodology 

Ihe  internal  perfcrmance  measurement  methcdclcgy 
provides  the  strategies  and  locations  for  the  placement  of 
checkfcints.  It  further  provides  the  kinds  of  perfornance 
data  to  be  collected.  This  information  enables  a  tetter 
understanding  of  the  target  system  by  measuring  certain 
capabilities^  such  as  the  time  spent  in  individual 
processes.  Using  this  information  of  how  the  system 
performs  internally  nay  lead  to  design  modifications  cr  to 
fine— tuning  of  the  system  for  increased  performance. 

2 .  Ihe  External  Performance  Measurement  Methodclog^ 

Ihe  external  performance  measurement  methcdclcgy 
provides  the  strategies  for  a  macro  view  of  the  database 
systett  performance  by  measuring  the  system  as  a  whole.  TJe 
focus  on  the  measurenent  of  the  response  time  of  the  target 
systeii  after  the  issuance  of  a  request.  A  test  database  and 
a  test  reguest  set  is  generated  usin^  software  tools. 

3 .  Combining   the   Internal   and   External   Measurement 
W€t hodolcqies 

The  natural  combination  of  the  internal  and  external 
performance  measurement  methodologies  is  synergistic  ii  the 
amount  of  information  that  is  provided.  The  overhead 
incurred  when  using  internal  performance  measurement  cede  is 
accurately  determined  using  this  metnodology  combination. 
The  external  performance  measurement  timings  can  be  froferly 
interpreted  using   the   internal   performance   measurement 
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results.  Ey  combiniDg  the  two  measurements,  the  whole  of 
the  measurement  results  is  more  meaningful  and  useful  than 
the  ircividual  results. 

B.   A  SUHHABY  OF  THE  PZTHODOLCGY  APPLICATION 

Ihrillirg  and  unexpected  results  are  collected  when  this 
methcdclcgy  is  applied  to  a  target  system,  i.e.,  MDBS. 
lirst,  the  methodology  proves  itself  to  be  successful  in 
attempting  to  verify  the  performance  and  capacity  claiirs  of 
MDBS.  This  results  from  being  able  tc  collect  sufficient 
data  en  a  target  system  tc  make  definitive  stateaents 
concerning  its  performance.  The  application  of  this  method— 
clogy  tc  MDES  is  alsc  surprisicgly  easy. 

A  second  result,  is  that  the  performance  and  capacity 
claims  of  MEBS  have  been  validated.  These  claims  are:  1) 
that  by  increasing  the  number  of  backends  used  as  a  part  of 
the  database  system  ard  by  keeping  the  size  of  the  database 
constant,  the  response  time  of  the  same  transactions  is 
propcrticnally  decreased,  and  2)  that  by  increasing  the 
number  of  backends  and  also  increasing  the  size  of  the  data- 
base, the  response  time  remains  relatively  constant.  These 
claims  are  validated  by  the  results  given  in  Chapter  VI. 

These  spectacular  results  provide  a  wealth  of  infcria- 
tion  frci  which  several  conclusions  can  be  made.  Ke  find 
that  under  MDBS,  the  response-time  improvement  increases  as 
the  numter  of  records  retrieved  increases.  Alsc,  the 
respcnse-tiae  reduction  decreases  as  the  number  of  records 
retrieved  increases-  Though  the  performance  measurement 
results  indicate  an  inprove  nient  in  the  response  time  of  the 
requests  when  the  internal  performance  measurement  software 
is  part  of  MDBS  code,  it  is  felt  that  this  phenomenon  is  the 
result  of  differing  system  overlays  and  that  the  induced 
overhead  of  internal  measurement  code  still  needs  tc  be 
calculated. 
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Th€  results  of  the  internal  perforaance  measurenients 
indicate  that  the  controller  processes,  i.e..  Bequest 
Preparation  and  Eost  Processing,  spend  very  little  tiae  to 
process  the  retrieval  request.  The  results  obtained  iiom 
Concurrency  Control  are  both  consistent  and  of  short  dura- 
tion, as  expected.  Ihe  results  also  show  that  the  majority 
cf  work  is  being  dene  in  Record  Processing  and  that  the 
addition  of  a  backend  reduces  the  record  processing  time  by 
nearly  half.  We  discovered  that  it  takes  the  backend  twice 
as  long  to  process  a  message  as  it  does  the  controller, 
possibly  due  to  hardware  processor  speed.  Finally,  there 
exists  a  linear  relationship  between  the  message  length  and 
the  time  to  pass  a  message  as  the  message  length  exceeds  100 
bytes . 

€•   EICCHMENDATIONS  ICE  FOT USE  EFFORTS 

Future  improvements  can  be  made  in  the  performance  meas- 
urement methodology  by  the  automation  of  the  existing 
external  software  tools.  Specifically,  the  ability  to  start 
a  test  which  will  execute  a  pre— determined  set  of  requests  a 
pre— deterKined  number  of  times  for  each  request,  and  collect 
the  results  in  a  file  is  a  desireable  feature. 
Additionally,  since  the  methodology  is  intended  to  be 
general  in  use,  tie  methodology  needs  to  be  applied  to 
different  database  systems  to  discover  its  applicability, 
ease  of  use,  and  usefulness  in  overall  performance  measure- 
ment of  the  target  system. 

In  terms  of  the  application  of  this  methodology  to  MDES, 
a  ccaplete  and  thorough  test  of  the  system  needs  to  be 
conducted.  An  exhaustive  test  of  MDBS  would  include 
conducting  test  with  databases  that  have  varying  record 
sizes.  Further,  testing  the  system  by  varying  the  number  of 
directory  attributes,  descriptors,   and  clusters  would  indi- 
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cat€  th€  rcle  of  the  directory  data  in  tne  system.  Insert, 
delete/  ard  update  requests  must  also  be  measured  to 
discover  their  impact  on  system  performance.  Lastly,  the 
measurement  should  be  extended  to  test  MDBS  when  it  uses  the 
secondary-memory-based  directory    management   process. 


95 


IIST   OJ    BEFEEENCES 


1.  Naval  Postgraduate  School  Report  NPS52-8 3-006,  Ihe 
O^JlSi^  ^£^  Analysis  of  a  Multi-fcackend  Database  Systea 
^0^  PerTor  iDanc€"^mpro  vemenf,  Zji^clloii^^ll'lxiarsion 
3n3  tagacifx  5icwfK~XPar5  T}  .  by  "HsiaO/ "David  "K.7  ail's 
HeEon,   JaIsnan"Rar^    June,    ^'983. 

2.  Naval      Postgraduate  School      Eeport    NPS52-83-007,         Ihe 

Design  and  Analxsis  of  a  Multi^backend  Database  SysTea 
I cr~perf or mance  impro vemen"E,  FunctionaliTy  Expansion 
3Ii3  "CapaciTy  gicwfn  TIl^l  "^^l»  ^Y  "Hsiao,  TJavi'3~K.,  an"3 
T?encn  ,   J  aishanl^ai,    June,    TSBj. 

3.  Naval  Postgraduate  School  Report  NPS52-8  3-00£,  Ihe 
I  nrlementa  tion  of  a  i^ulti-backend  Database  S  ysTem 
I'SiB'SJ:  "gar"^  I  —  Software  Znqineerin_^  ^Tra^egies  anH 
Ellofts  Towards  a  ProloTyge  ^^H^.-Ey  Kerr,  Uouglas  E. , 
Crccli,  III,  Sli,  7ong— 'ZHi,  aiTS  Strawser,  Paula,  June, 
1983. 

4.  Naval  Postgraduate  School  Report  NPS52-82-008,  Ihe 
Inclementa ti en  of  a  Multi-backend  Database  SysTera 
TlIB^n  mi  il~  -~1he"'Firsr~PfofQtype  l^BS  anc~TEe 
Software  Enqineerin^  ^x^erlence  ,  Fy  HigasniSa,  Iingui 
Be,  Bsiao^Hjavi'a  "K.,  "Kerr,  Uougias  S.,  Orooli,  Ali, 
Shi,    Zong— Zhi,    and   Stravser,    Paula,    June,    1982. 

5.  Naval  Postgraduate  School  Report  NPS52-8  3-00 3,  The 
I  ntle  mentation      of      a      ?lulti-backend      Database.    SysTem 


6.  Naval  Postgraduate  School  Report  NPS52-84-005,  Ihe 
I E pie mentation  of  a  Multi-backend  Database  System 
IJIS^f:  FarT  11  -  TEe  Bevise^Uon currency  ^cntrcl  a^ 
Directory  ^ara^gemenl  Irocesses  ana[  "the  Ttevise? 
BeliniTions  cl  Tnrer— Process  anH  ISter-Ccmf ater 
Besgages,  "Ey  Demur jian,  STeven  A.,  "Hsiao,  BaviO  "K., 
Kerr,    Douglas    £.    and    Crooji,    Ali,    February,     1984. 

7.  Naval  Postgraduate  School  Eeport  NPS52-8 4-C04 ,  A 
Methcdoloqy  fcr  Benchmarking.  Relational  Database 
HacHines,    by    BTrawser  ,'Taula^.  ,    January,    T984. 

8.  University  of  Wisconsin  Report  MCS82-01870,  Can 
Database  Machines  Do  Better?  A  Comparative 
Perlcimance  EvaluaTio  n,  by  BTTIon,  Dina,  De^ilt,  Bavi'd 
D7,   Tur5yfill,    Tarolyn,    December,     1983. 


96 


9-  Kovalchik,    Josejh    G. .     Perf crmance    Evaluation    Tools    for 

a  Multi-backend  Dataijase  ^ysTem^  Fr"5.  "TEesis,  TIavaT 
Poslgia^uare  School,  "Hcnterey/  California,  D€ceirb€r, 
1S63. 

10.  Datatase  Machires,  A  Messa^e^Or ien ted  Implementation 
of  a  Multi-backena  Datalase  ^islem  Ttl^B^T/  '^y  "Bcyne, 
'EicTiarar'D.  ,  "Hsiao.  Ha  vaTS  "K.  ,  "Kerr,  Douglas  2.,  OrcOji, 
Ali,    September,    1983. 

11. 


12.  An  Attribute— Eased  S  ystem  as  a  Database  Kernel  of 
Eatalase  Sysjeis,  By  Demurlian,  Steven  A.,  Hsiao, 
'DaviZ'XT ,  Hacy  , "Griff  en  N.  ,  Strawser,  Paula  E.,  unpub- 
lished, March  1S8U. 

13.  Hsiao,  David  K.  and  Harary,  F.,  "A  Formal  System  for 
Information  Retrieval  From  Files",  Communications  of 
The  ACM,  Vol.  13,  No.  2,  February  191'(r. 

1^-  iCIll-B  Parallel  Communication  Link  Differential  TDM 
Bus,  "Digital  IcuipraeoT  Corporation,   MaynarB,   "Mass., 


97 


BIBLIOGBAPHY 

Hancock/    Les  and   Kri€cer,    Morris,    The   C    Primer,      McGraw-Hill 
Eook    Ccffliany^    N.Y.,     1583. 

Kernnigan,    Erian   and    fitchie,      Dennis    M.,      The  C    PrO'jrairminq 
l^H^i^iH^/    Prentice— Hall,    1978. 

E5X-1  lCj'/K-IIU5       Executive      Bef  erence      Manual, 
Tigital   Iguj.pmenI~Coiporati  en,    Haynard,    lass. 


AA-H26  5A-IC, 
1979. 


VAX^VMS      System      Services      Reference      Manual,         AA-D018E-TE, 
Tigi'Eal    Eguipment   CoiforationT    ^ynardj   Hass.  ,    1980. 


98 


IHIIIAL  DISTRIBOTION  LIST 


Nc.  Ccpies 


DefeEse   Technical  Infor niaticn  Center  2 

Caneiori    Station 
Alexandria,    Virgiiia      2231U 

Dudley    Knox    Liiorary.    Code    0142  2 

Naval   Postgraduate    School 
Hcnterey,    Califorria      93943 

Cepartment    Chairnac.    Code    52  6 

Eefartment    of   CoiEfuter    Science 
Naval   Postgraduate    School 
ficnterey,    Califorria      93943 

Ccmniandant    of   the  Marine   Corps  1 

Cede   CC 

Headquarters-  Marine  Corps 

Washington,  E.  C,  20380 

Office  of  Research  Administration  1 

Cede  012A 

Naval   Postgraduate   School 

Mcnterey,    Califorria      93943 

Ccmputer  Technolccies  Curricular    Office  1 

Cede    37 

Naval   Postgraduate   School 

Mcnterey,    Califorria      93943 

Betert    Tekampe  2 

13913    Gum  Lane 
Weoctridge,    Virginia   22  193 

Robert    5iatson  2 

3481    Lycn   Park    Court 
Kccctridge,    Virgiria   22  192 


99 


135  37 


21C313 


Thesis 

T23T    Tekampe 

c.l  Internal  and  external 
performance  measurement 
methodologies  for  data- 
base systems. 


r 


t 

210813 

Thesis 

T237 

Tekampe 

c.l 

Internal  and  external 

performance  measurement 

methodologies  for  data- 

base systems. 

